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Abstract 

This papers develops a logical language for representing probabilistic causal laws. Our 
interest in such a language is twofold. First, it can be motivated as a fundamental study 
of the representation of causal knowledge. Causality has an inherent dynamic aspect, 
which has been studied at the semantical level by Shafer in his framework of probability 
trees. In such a dynamic context, where the evolution of a domain over time is considered, 
the idea of a causal law as something which guides this evolution is quite natural. In 
our formalization, a set of probabilistic causal laws can be used to represent a class of 
probability trees in a concise, flexible and modular way. In this way, our work extends 
Shafer's by offering a convenient logical representation for his semantical objects. 

Second, this language also has relevance for the area of probabilistic logic programming. 
In particular, we prove that the formal semantics of a theory in our language can be equiv- 
alently defined as a probability distribution over the well-founded models of certain logic 
programs, rendering it formally quite similar to existing languages such as ICL or PRISM. 
Because we can motivate and explain our language in a completely self-contained way as 
a representation of probabilistic causal laws, this provides a new way of explaining the 
intuitions behind such probabilistic logic programs: we can say precisely which knowledge 
such a program expresses, in terms that are equally understandable by a non-logician. 
Moreover, we also obtain an additional piece of knowledge representation methodology for 
probabilistic logic programs, by showing how they can express probabilistic causal laws. 

KEYWORDS: Uncertainty, Causality, Probabilistic Logic Programming 



1 Introduction 

Logic based languages, such as logic programming, play an important role in knowl- 
edge representation. One of the known weaknesses of such languages is that they 
are not well suited for representing probabilistic or uncertain knowledge. This has 
prompted a significant amount of research into probabilistic logic programming lan- 
guages, both in the knowledge representation community itself, as well as in machine 
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learning, where such languages are developed for the purpose of stochastic relational 
learning. 

Syntactically, such a language typically annotates a logic programming rule, 
or some part thereof, with a probability; the formal semantics of the language 
then somehow specifies a probability distribution — typically over a set of possible 
worlds — in terms of these individual probabilities. This is the way in which these 
probabilistic logic programming languages tend to be formally defined. However, 
such a formal definition still leaves one important question unanswered, namely that 
of how expressions in the langauge should be understood on the informal level, i.e., 
how would one explain their intuitive meaning to a non-logician? 

For the two seperate components of logic programming and probability, this 
question has of course already been addressed at length. For instance, the informal 
meaning of logic programs — and in particular its negation-as-failure connective — 
has been explained among others in epistemic terms, referring to the beliefs of a 
rational agent (Gelfond and Lifschitz 1991), and in terms of the well-known math- 
ematical concept of an inductive definition (Denecker 1998). The meaning of state- 
ments in probability calculus, on the other hand, has been explained among oth- 
ers in frequentist terms, e.g. (jVenn 1866j) . and in terms of degrees of belief, e.g. 
(|De Finetti 1937|1 . 

So far, research on probabilistic logic programming languages has not yet paid a 
great deal of attention to this issue of the informal meaning of expressions. It tends 
to be assumed that one already has sufficient intuitions about the meaning of logic 
programs and that the probabilities can simply be tacked on top of that. This paper 
presents an effort to develop a probabilistic logic programming langauge, whose 
informal semantic^ is explained in full detail in a completely self-contained way. 
In general, the advantage of such an approach is that it gives more philosophical 
insight into the meaning of statements in the language, makes it easier to explain 
it to domain experts, and can help to provide a better modeling methodology for 
it. 

One of the key tasks that such an effort needs to accomplish is to show con- 
vincingly that the formal semantics of the language indeed correctly captures the 
informal meaning that is attributed to its expressions, i.e., that these expressions 
indeed mean — formally — what we claim they — intuitively — mean. To ensure that 
this is done properly, we will adopt a constructive approach, where we first describe 
a particular kind of knowledge that we want to represent, then show how we can 
formalise the meaning of this knowledge in a way which is straightforward enough 
for its correctness to be intuitively obvious, and finally prove that the language we 
have thus defined is actually equivalent to a certain probabilistic logic programming 
construction. 

The language that we develop will attempt to formalise probabilistic causal laws. 
The use of causal laws to compactly represent domains is commonplace in various 

1 The informal semantics of a language is commonly also referred to as its "intuitive reading". 
We prefer the term "informal semantics", however, because it stresses the close relation that 
there (should) exist (s) to the formal semantics. . 
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John and Mary are each holding a rock. With probability 0.5, Mary will throw her rock at 
a window. This will break the window with probability 0.8. John will then also throw his 
rock at the window. He will hit it with probability 0.6. 

Fig. 1. The story of John and Mary. 




Fig. 2. Probability tree for the window breaking story. 

action languages, related to logic programming, e.g. (|Gelfond a nd Lifsc hitz 1993jl . 
Here, we will investigate a probabilistic variant of such laws. We will do this in 



the semantic context developed by Shafer (1996). In this work, Shafer presents his 



view on a number of fundamental causal and probabilistic concepts. His central hy- 
pothesis is that such concepts are best considered in an explicitly dynamic context: 
when speaking of probability or causality, we should do so, he says, in the context 
of a particular story about how the domain evolves, which he formalises by means 
of probability trees. As he himself puts it: 

A full understanding of probability and causality requires a language for talking about 
the structure of contingency — a language for talking about the step-by-step unfolding of 
events. This book develops such a language based on an old and simple yet general and 
flexible idea: the probability tree. 

Figure [2] depicts a probability tree corresponding to the story shown in Figure Q] 
In natural language, we could say that such a tree paints the following picture. The 
domain starts out in an initial state. Then, some event happens, which causes the 
domain to transition to a new state. However, we do not know up front precisely 
which new state this is going to be, exactly; instead, the new state is chosen prob- 
abilistically from a set of alternatives. For instance, in the initial state of the tree 
in Figure [21 the event happens that Mary makes up her mind whether to throw, 
which leads to either a state in which she does or a state in which she does not. 
This step is then repeated — that is, in the new state, a different event happens, 
which leads to another new state, again chosen probabilistically from some set of 
alternatives — until finally this process arrives at a final state, in which no further 
events happen. 

Throughout this paper, we will continue to talk about probability trees using the 
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language introduced above. In particular, we will carry on using the word event to 
refer to the occurrence that causes a transition from one state to the next. This 
use of the term differs from its standard use in probability theory, where it denotes 
a set of possible outcome of an experiment. Shafer introduces the terms Humean 
event (that which causes a transition between states) and Demoivrian event (a set 
of possible outcomes) to distinguish between these two different meanings of the 
word. Using the chain rule, the probability of following any particular branch in 
this tree can be computed as the product of the probabilities of the individual 
edges. For instance, the probability of the left-most branch of the tree in Figure [2] is 
0.5 • 0.8 • 1 • 0.6 = 0.24. A Demoivrian event corresponds to a sets of branches of the 
tree. For instance, the Demoivrian event of the window being broken corresponds 
to the set of all branches in which it breaks, i.e., the first, second, third and fifth 
branch. The probability of such a Demoivrian event can be computed as the sum 
of the probabilities of all branches that belong to it, e.g., the probability of the 
window breaking is 0.24 + 0.16 + 0.06 + 0.3 = 0.76. In the rest of this paper, we 
reserve the term "event" for Humean events (i.e., transitions between states) and 
will therefore omit the modifier. 

This paper will develop a language for describing the causal laws according to 
which a probability tree unfolds. In other words, we will assume that each event in 
such a tree happens for a reason, i.e., that it is actually caused by some particular 
property of the state in which it happens. We then construct a language that allows 
to describe these reasons. In the extreme case, it might be that we can say nothing 
more than that each state of the tree is in itself the reason for the event that happens 
there; in this case, we obtain nothing more than an alternative representation for 
the tree itself. However, if the same event can happen in different parts of the tree, 
each time caused by the same property of the state in which it happens, we might 
end up with a significantly more compact representation. 

In the story in Figure [TJ we find four probabilistic causal laws: 

• John throwing his rock causes the window to break with probability 0.6; 

• Mary throwing her rock causes the window to break with probability 0.8; 

• Mary decides to throw with probability 0.5 (this event is vacuously caused); 

• John always throws (this event is also vacuously caused and it has only one 
possible outcome). 

In the language that we will develop in the next section, we will write down such 
probabilistic causal laws in the format: 



where the cause can be omitted if the event is vacuously caused. In this syntax, 
the above probabilistic causal laws can be written down as: 



possible effects 



cause 



(Throws(Mary) : 0.5) 
Throws(John) 



(Break : 0.8) 
(Break : 0.6) 



Throws (Mary). 
Throws(John). 



(1) 
(2) 
(3) 
(4) 
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Fig. 3. Alternate probability tree for the window breaking story. 

In this representation, John and Mary each get their own probabilistic causal 
law. This is necessary because they indeed throw differently causing the window to 
break with different probability. However, we can also imagine an example in which 
both hit the window with the same probability. In this case, our language also 
allows a more compact representation, using a variable to range over the different 
persons that might throw: 

\/x (Break : 0.8) <— Throws(x). 

The meaning of such a statement is as one would expect: each particular person 
that throws (i.e, each instantation of x for which Throws(x) holds) hits the window 
with 0.8. 

If we compare our four probabilistic causal laws to the probability tree in Figure 
[2] we see that this tree indeed obeys the causal laws, in the following sense: 

• As we go down any branch of the tree, we find that all events which should 
happen according to our causal laws actually do so. The two unconditional 
causal laws <j3j> and (f4|) state that the events that Mary decides whether she 
will throw and that John decides that he will throw should always happen; 
and indeed, we find that in each branch of the tree, they do. In the left-most 
branch of the tree, for instance, the result of these two events is that both 
Mary and John decide to throw, so according to causal laws |T]) and @ the 
two events by which their respective throws break the window should also 
happen; and again, they indeed do. 

• The events that happens according to the causal laws are also the only events 
that happen. For instance, in the right-most branch, Mary has decided not 
to throw, so the event of her rock breaking the window with probability 0.8 
does not happen. Moreover, each of the events which should happen happens 
precisely once; it is not the case, for instance, that once Mary has decided to 
throw, the event of her rock breaking the window with probability 0.8 keeps 
on happening ad infinitum. 

To define the semantics of our language, we will formalise this idea of a proba- 
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bility tree obeying a set of probabilistic causal laws. We will call such an obeying 
probability tree an execution model of the set of causal laws. 

In general, a single set of causal laws might have many such execution models. 
Indeed, it is clear that a probability tree contains more information than the causal 
laws: events in the tree are totally ordered (for instance, in Figure [2 Mary decides 
to throw before John does), whereas the causal laws only provide a partial order 
on the events (for instance, the event of Mary's rock hitting the window can only 
happen after the event of Mary deciding to throw, since one causes the other; 
however, no order is imposed between e.g. the events of John throwing and Mary 
throwing). However, the additional information that is contained in a probability 
tree is actually irrelevant for the final outcome that will be reached. Let us consider, 
for instance, the alternative tree of Figure [3l in which John throws before Mary 
does. This tree also obeys our four causal laws, yet has a different structure than 
the tree in Figure El However, we see that the probability of the window eventually 
breaking is nevertheless precisely the same in this tree as it was in the other one, 
namely 0.76. Later in this paper, we will prove that all execution models of a 
set of causal laws always generate precisely the same probability distribution. In 
this sense, causal laws manage to capture the essence of a probability tree, while 
allowing irrelevant details (e.g. does John throw first or does Mary throw first?) to 
be ignored. This renders our representation quite succint. 

Shafer's book also recognizes that the naive graphical representation of probabil- 
ity trees tends to grow unwieldy rather quickly. In the final chapter of his book, he 
therefore briefly examines a number of alternative, more compact representations 
for such trees, including Bayesian nets (jPearl 1988|) and a representation based 
on Martin-L6f type theory (Martin-L6f 1982). In both these representations, new 
events are caused by the outcomes of some fixed set of previous events. The de- 
scription of an event itself therefore already carries within it certain restrictions 
on the order in which events can happen. By constrast, in CP-logic events are not 
caused directly by previous events, but rather by properties of the state in which 
they happen. The fact that we do not represent any explicit a priori information 
about the order in which events happen makes our representation more flexible 
and allows probabilistic causal laws to easily be reused in different contexts. Let us 
suppose, for instance, that we know a probabilistic causal law that describes one 
particular way in which a certain disease can cause certain symptom. This law can 
then be reused, without change, regardless of what might cause the disease, which 
other causes there might be for the same symptom, or even whether there are still 
other ways in which the same disease might also cause the same symptom. 

This paper is structured as follows. Section [2] briefly introduces some preliminary 
concepts from lattice theory and also logic programming. In Section [3l we formally 
define an initial, restricted version of CP-logic. In Section |4j we show how a certain 
kind of process can be modeled in this basic language, which also suggests a way of 
defining a more general version of CP-logic. This will be done in Section [5] Section 
[6] then discusses the resulting definitions in more detail. In Section [71 we investigate 
the precise relation between CP-logic and Bayesian networks. Section [8] relates CP- 
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logic to logic programming. Finally, Section [9] discusses some related work. Proof 
of the theorems presented in this paper will be given in Appendix | Appendix A[ 

Part of the material in this paper was presented at conferences (Vcnnckc ns et al. 2 004) 
and (Vcnnckc ns et al. 2 006). 

2 Preliminaries 

This section recalls some well-known definitions and results from lattice theory and 
logic programming. To a large extent, this material is relevant only for the proofs 
of our theorems. It can safely be skipped on a first reading of this paper. 

2.1 Some concepts from lattice theory 

A binary relation < on a set L is a partial order if it is reflexive, transitive and 
anti-symmetric. A partially ordered set (L, <) is a lattice if every pair (x, y) of 
elements of L has a unique least upper bound and greatest lower bound. Such a 
lattice (L,<) is complete if every non-empty subset SCi has a least upper bound 
and greatest lower bound. A complete lattice has a least element _L and a greatest 
element T. An operator O : L — > L is monotone if for every x < y, O(x) < O(y). 
An element x G L is a prefixpoint of O if x > 0(x), a fixpoint if x = 0(x) and a 
postfixpoint if x < 0(x). If O is a monotone operator on a complete lattice, then for 
every postfixpoint y 7 there exists a least element in the set of all prefixpoints x of O 
for which x > y. This least prefixpoint greater than y of O is also the least fixpoint 
greater than y of O. Moreover, it can be constructed by successively applying O to 
y, i.e., as the limit of the sequence (y, 0(y), 0(0(y)), . . .). In particular, because _L 
is a trivial postfixpoint, O has a least prefixpoint which is equal to its least fixpoint 
and which can be constructed by successive application of O to _L. 

2.2 Some concepts from logic programming 

We assume familiarity with classical logic. A Herbrand interpretation for a vocab- 
ulary E is an interpretation, which has as its domain the set HU (E) of all ground 
terms that can be constructed from E and which interprets each constants as it- 
self and each function symbol f /n as the function mapping a tuple (£i, . . . , t n ) to 
f(ti, . . . , t n ). We can identify a Herbrand interpretation with a set of ground atoms. 
A partial Herbrand interpretation is a function v from the set HB(T,) of all ground 
atoms, also called the Herbrand base, to the set of truth values {f, u,t}. A (total) 
Herbrand interpretation corresponds to a partial Herbrand interpretation that does 
not include u in its range. On the set of truth values, one defines the precision order 

u < p f and u < p t 

and the truth order: 

f <t u < t t. 
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These orders can be pointwise extended to partial Herbrand interpretations. Each 
totally ordered set S of partial Herbrand interpretations has a < p -least upperbound 
denoted lub< (S). The three- valued truth function p v for sentences ip and partial 
Herbrand interpretations v is defined by induction: 

• p v = u(p), for pe HB(E); 

• /\(f)y = Mm< t (f,f); 

• (Vz (t>{x)Y = Min< t {{<f)(ty\t £ HU(T,)}). 

• = (tp")- 1 where f" 1 = t, t" 1 = f , u 1 = u. 

A crucial monotonicity property of three- valued logic is that v < p v' implies ip v < p 
tf/. 

The well-founded semantics of logic programs was originally defined in (jVan Gelder et al. 1991[) 
We present an equivalent definition that was developed in (Denecker and Vennekens 2007) 
Formally, a logic program P is a set of rules of the form p <— <p, where p is a ground 
atom and <p is a first-order sentence. 

Definition 1 {well-founded induction) 

We define a well-founded induction of P as a sequence of partial Herbrand inter- 
pretations {v a )o< a <f3 satisfying the following conditions: 

• v = _L< , the mapping of all atoms to u; 

• u x = Iub< p ({i/P\f3 < A}), for each limit ordinal A; 

• v a+1 relates to v a in one of the following ways: 

— v a+1 = v a [p : t] such that for some rule p ^- ip in P, p v " = t; 

— v a+1 = v a \U : f] where U is an unfounded set, i.e., a set of ground 
atoms such that for each p in U, v a {p) = u and for each rule p <— tp in 
P, p» a+1 = f . 

A well-founded induction is a sequence of increasing precision. We call a well- 
founded induction (v a ) a <(3 terminal if it cannot be extended with a strictly more 
precise interpretation. Each well-founded induction whose limit is a total interpre- 
tation is terminal. We now define the well-founded model of P as the limit of any 
such terminal well-founded induction. As the following result shows, this definition 
coincides with the standard one. 

Theorem 1 

([Denecker and Venneke ns 2007|) Each terminal well-found induction of P converges 
to the well-founded model of P, as it was defined in (jVan Gelder et al~1 99f ). 

In certain logic programming variants, such as abductive logic programs (jKakas et al. 1992j) 
and ID-logic (jDenecker and Tern ovska 2007j). a distinction is made between predi- 
cates that are defined by the program and predicates that are left open. The set of 
defined predicates must contain at least those predicates that appear in the heads 
of rules of the program. This distinction is similar to that between endogenous 
and exogenous random variables, which is common in probabilistic modeling. It 
is straightforward to generalize the well-founded semantics to this case. Given an 
interpretation O of the open predicates, we define a well-founded induction of P 
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in O by the same inductive definition as for ordinary well-founded inductions, only 
we now have as a base case that v° should be the least precise partial Herbrand 
interpretation that extends O. It is easy to see that each v l in such a well-founded 
induction in O in fact extends O and also that if there are no open predicates, this 
definition simply coincides with the original one. The well-founded model of P in 
O is then the limit of any terminal well-founded induction of P in O. 



3 A logic of probabilistic causal laws 

Our goal in this section is to define a language for representing probabilistic causal 
laws. Before going into the mathematical details, we first outline the general picture. 

To represent knowledge in a logical language, the first thing that is needed is 
a suitable vocabulary. Usually in logical modeling, this vocabulary is assumed to 
be such that each possible state of the domain of discourse corresponds to an in- 
terpretation for it. In Shafer's probability trees, possible states of the domain are 
represented by nodes of the tree. To link these two formal settings, our seman- 
tics will therefore consider probability trees in which each node corresponds to an 
interpretation for a given vocabulary. 

As we introduced the concept in Section [TJ a probabilistic causal law states the 
cause and possible effects of a particular event or class of events. The cause specifies 
in which nodes of the tree the event might happen, i.e, it is some property of the 
domain of discourse such that the event can happen in precisely those states of the 
domain in which this property holds. The natural thing, therefore, is to represent 
such a cause by a first-order formula <p, whose meaning is that the event might 
happen in those states s of a probability tree for which the associated interpretation 
T{s) is such that X(s) \= (j). 

Each event that happens makes a transition from a node s of a probability tree 
to one of the children s' of s. The description of the effects of such an event should 
specify how it will affect the state of the domain, i.e., what the interpretations 
X(s') associated to the children s' of s should be. There are many conceivable ways 
of representing such knowledge, but in this paper we stick to a very simply one: 
we assume that each possible effect of an event corresponds to a single ground 
atom P(t) of our vocubalary, such that the interpretation I(s') corresponding to 
the new state s' is indentical to the interpretation T(s), apart from the fact that 
P(t) is now true. We choose this simple representation for two reasons. First, the 
aim of our exercise is to come up with a semantics that formalises probabilistic 
causal laws in a way that clearly coincides with our intuitions about the meaning of 
such laws. Keeping the representation of effects simple helps to achieve the desired 
clarity. Second, we are not just interested in this language for its own sake, but 
also because we want to use it to explain the meaning of certain probabilistic logic 
programming statements. Our simple representation of effects will also serve to 
elucidate this link to logic programming. 
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3. 1 Syntax 

In this section, we formally define the language of CP-logic. Let us fix a finite 
relational vocabulary, consisting of a set of predicate symbols and a set of constants. 
We assume that the predicates of our vocabulary are split into a set of endogenous 
predicates and a set of exogenous ones. The idea behind this distinction is of course 
that the endogenous predicates should describe things that are internal to the causal 
process being modeled, while the exogenous predicates describe things external to 
it. 

A causal probabilistic law, or CP-law for short, is a statement of the form: 

Vx {Ax :ax) V • • • V (A„ : a n ) ^ fa (5) 

where the on are non-zero probabilities with ~Y^on < 1 , is a first-order formula and 
the Ai are atoms, such that the universally quantified tuple of variables x contains 
all free variables in <\> and the Aj. Moreover, the predicate symbol of each of the 
atoms Ai should be an endogenous predicate. 
Such a CP-law is read as: 

"For each x, <f> causes an event whose effect is that at most one of the Ai becomes true; 
for each i, the probability of Ai being the effect of this event is 

If the causal law has a deterministic effect, i.e., it causes some atom A with 
probability 1, we also write A <— cj> instead of (A : 1) <— fa We allow the precondition 
4> to be absent, meaning that the event is vacuously caused. In this case, the causal 
law is called unconditional and we omit the '<— '-symbol as well. If the tuple x of 
variables is empty, we call the causal law ground. We remark that the precondition 
<p of such a ground causal law may still contain variables, as long as they are all 
bound by some quantifier in fa 

A CP-theory is a finite multisei0of CP-laws. For now, we will restrict attention to 
CP-theories in which all formulas <f> are positive, i.e., they do not contain negation. 
Afterwards, Section [5] will examine how negation can be added. 

Example 1 

In about 25% of the cases, syphilis causes a neuropsychiatric disorder called general 
paresis, and in fact, syphilis is the only cause for paresis. This can be modeled as 
follows: 

{Paresis : 0.25) <— Syphilis. (6) 

where Syphilis is an exogeneous predicate. This example illustrates the difference 
between causation and material implication. Indeed, because syphilis is the only 
cause for paresis, observing that a patient has paresis implies that he must also 
have syphilis, i.e., the material implication Paresis D Syphilis holds. So, in this 
example, the directions of causation and material implication are precisely opposite. 



2 Example \3\ explains why we consider multisets instead of sets. 
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Example 2 

Our running example for this section will also be a medical example. Pneumonia 
might cause angina with probability 0.2. Vice versa, angina might cause pneumo- 
nia with probability 0.3. A bacterial infection can cause either pneumonia (with 
probability 0.4) or angina (with probability 0.1). We consider bacterial infection as 
exogeneous. 

(Angina : 0.2) <— Pneumonia. (7) 
(Pneumonia : 0.3) «— Angina. (8) 
(Pneumonia : 0.4) V (Angina : 0.1) «— Infection. (9) 

Example 3 

A CP-theory is a multiset of CP-laws, which means that it may contain several in- 
stances of the same event. To illustrate this, consider a variant of the above problem 
in which the patient comes into contact with two different sources of infection, each 
of which might cause him to become infected with a probability of 0.1. To model 
this, we can add the following multiset of two unconditional events to the theory 
of Example [2j 

(Infection : 0.1). (10) 
(Infection : 0.1). (11) 

We now define some notation to refer to different components of a ground CP-law. 
The head head(r) of a rule r of form ^ is the set of all pairs (Ai, a*) appearing in 
the description of the effects of the event; the body of r, body(r), is its precondition 
<$>. By headAt(r) we denote the set of all atoms Ai for which there exists an on such 
that (Ai, £ head(r). Similarly, by bodyAt(r) we will denote the set of all atoms A 
which "appear'H in body(r). For the above Example [2 if r is the CP-law (J9)), then 
head(r) = {(Pneumonia ,0.4), (Angina, 0.1)}, headAt = {Pneumonia, Angina}, 
body(r) = Infection and bodyAt(r) = {Infection}. 

We will call a CP-law E <— <f> a rule if we want to emphasize that we are referring 
to a syntactical construct. 



3.2 Semantics 

This section defines the formal semantics of CP-logic. We will restrict attention 
to Herbrand interpretations, i.e., we consider only interpretations whose domain is 
the set of constants of the theory and which interpret each constant as itself. This 



3 More formally, we use bodyA t (r) to denote At(body(r)), where At is the mapping from sentences 
to sets of ground atoms, that is inductively defined by: 

• For Q(t) a ground atom, Ai(Q(t)) = {Q(t)}; 

• For <j> o ip, with o either V or A, At(tf> o ip) is denned as At(<j>) U At(ip); 

• For -n(j>, At(-«j>) = At(<f>); 

• For Sx 4>, with either V or 3, At(Sx <f>) = Utg^fj;) At((p[x/t]), where denotes 
the Herbrand universe for the vocabulary S. 
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restriction is made for two reasons: it simplifies the presentation, and it is also what 
is usually done in (probabilistic) logic programming. However, it is easy to extend 
all our definitions and results to arbitrary domains. 

We view a non-ground CP-law Vx r as an abbreviation for the set of all ground 
CP-laws r[x/t] that result from replacing the variables x by a tuple t of ground 
terms in vocabulary E. For instance, if we wanted to consider multiple people in 
Example [2l we might include constants {John, Mary} in our vocabulary E and 
write the non-ground rule 

\/x (Angina(x) : 0.2) <— Pneumonia{x) , 

to abbreviate the two CP-laws 

(Angina(John) : 0.2) <— Pneumonia(John). 
(Angina(Mary) : 0.2) <— Pneumonia(M ary) . 

Because CP-theories are finite, the use of such abbreviations only makes sense in 
the context of a finite domain, i.e., when the vocabulary does not generate an 
infinite number of terms. By restricting attention to finite relational vocabularies, 
we ensure that this is the case. 

In our formal treatment of CP-logic, we will never consider non-ground rules, but 
always assume that these have already been expanded into a finite set of ground 
CP-laws. When using such non-ground rules in examples, we will implicitly assume 
that predicates and constants have been appropriately typed, in such a way as to 
avoid instantiations that are obviously not intended. We will also allow ourselves 
to use arithmetic function symbols, such as +/2 and -/2, and assume that the 
grounding replaces terms made from these symbols by numerical constants in the 
appropriate way. 

As already explained, our basic semantical object will be that of a probability 
tree in which the nodes correspond to interpretations. 

Definition 2 {probabilistic Yi-process) 

Let E be a vocabulary. A probabilistic Y,-process X is a pair (T;X), where: 

• T is a tree structure, in which each edge is labeled with a probability, such 
that for every non-leaf node s, the probabilities of all edges leaving s sum up 
to precisely 1; 

• X is a mapping from nodes of T to Her brand interpretations for E. 

In a probability tree, we can associate to each node s the probability V(s) of 
a random walk in the tree, starting from its root, passing through s. Indeed, for 
the root _L of the tree, V(A-) = 1 and for each other node s, V(s) = J\ i a i where 
the ai are all the probabilities associated to edges on the path from the _L to 
s. Essentially, the mapping V contains all the information that is present in the 
labeling of the edges and vice versa. To ease notation, we will sometimes take the 
liberty of identifying a probabilistic E-process (T;X) with the triple (T;X;V) and 
ignoring the labels on the edges of T. 

Each probabilistic E-process now induces an obvious probability distribution over 
the states in which the domain described by E might end up. 
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Fig. 4. A process T for Example [2] and its distribution tit- 



Definition 3 (ttt) 

Let E be a vocabulary and T = (T;X;T') a probabilistic E-process. By ttj- we 
denote the probability distribution that assigns to each Herbrand interpretation / 
of E the probability Y^seL r (i) ^*( s )' w ^ ere Lt(I) is the set of all leaves s of T for 
which I(s) = L 

Like any probability distribution over interpretations, such a ttt also defines a 
set of possible worlds, namely that consisting of all / for which ttt(I) > 0. If all the 
probabilities V(s) are non-zero, then this is simply the set of all X(l) for which I is 
a leaf of T. 

We now want to relate the transitions in such a probabilistic E-process to the 
events described by a CP-theory. 

Definition 4 {rules firing) 

Let S be a vocabulary, C a CP-theory in this vocabulary and T a probabilistic 
S-process. Let r € C be a CP-law of the form: 

(Ax : a x ) V • • • V (A„ : a„) <- f 

We say that r fires in a node s of T if s has n + 1 children si, . . . , s„+i, such that: 

• For all 1 < i < n, I{si) — T(s) U {Ai} and the probability of edge (s, Si) is on\ 

• For s ra +i, X(s„+i) = lis) and the probability of the edge (s, s n +i) is 1 — ^ a^. 

For simplicity, we will omit edges labeled with a probability of zero; this does not 
affect any of the following material. 

This definition now allows us to link the transitions in a probabilistic E-process 
T to the events of a CP-theory C. Formally, we will consider a mapping £ from 
each non-leaf node s of T to an associated CP-law r £ C. Because the same ground 
CP-law should fire at most once in each branch, the following definition will also 
consider, for a node s, the set of all CP-laws that have not yet fired in s, i.e., the set 
of all r € C for which there does not exist an ancestor s' of s such that £(s') = r. 
We will denote this set as IZs(s). 
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Let C be a positive CP-theory and X an interpretation of the exogenous predicates. 
A probabilistic S-process T = (T;X) is an execution model of C in context X, 
written T \=x C, iff there exists a mapping £ from the non-leaf nodes of T to C, 
such that: 

• For the root 1 of T, 1(1.) = X; 

• In each non-leaf node s, a CP-law £(s) £ 7S.g(s) fires, such that its precondi- 
tion is satisfied in s, i.e., l(s) \= body(£(s)); 

• For each leaf I of T, there are no CP-laws r € lis (I) for which 1(1) |= body(r). 

If there are no exogenous predicates, we simply write T \= C . 

Example [2] has one execution model for every specific context X; the process for 
X = {Infected} is depicted in Figure HI As we showed with the window breaking 
example in SectionHJ there also exist thoeries which allow multiple execution models 
for a given context. However, all of these execution models must then generate the 
same probability distribution over their final states. 

Theorem 2 (Uniqueness — positive case) 

Let C be a positive CP-theory. If T\ and Ti are both execution models of C, then 
Proof 

Proof of this theorem can be found in Section IA.2I □ 

This theorem shows that knowing all the probabilistic causal laws of a domain 
gives enough information to predict a single probability distribution over the final 
states that this domain might reach. This result is important for two reasons. 

First, it suggests an appealing explanation for why causality is such a useful and 
important concept: causal information tells you just enough about the behaviour 
of a probabilistic process to be able to predict its final outcome in every possible 
context, while allowing irrelevant details to be ignored. As such, it offers a compact 
and robust representation of the class of probability distributions that can result 
from such a process. Second, in our construction of CP-logic, we have started from 
Shafer's dynamic analysis of causality, using the probability tree as a basic semantic 
object. In this respect, our approach differs from that of causal Bayesian networks 
(jPearl 2000p . in which causal information is viewed more statically, with probabil- 
ity distributions as basic semantical objects. The above theorem relates these two 
views, because it allows us to not only view a CP-theory as describing a class of 
processes, but also as defining a unique probability distribution. 

Definition 6 (iTq ) 

Let C be a CP-theory and X an interpretation for the exogenous predicates of C. 
By 7Tp, we denote the unique probability distribution ttt, for which T \=x C. If 
there are no exogenous predicates, we simply write ire- 

A CP-theory can be viewed as mapping each interpretation for the exogenous 
predicates to a probability distribution over interpretations of the endogenous pred- 
icates or, to put it another way, as a conditional distribution over interpretations 
of the endogenous predicates, given an interpretation for the exogenous predicates. 
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Definition 7 (models of a CP-theory) 

Let C be a CP-theory and ir a probability distribution over interpretations of all 
the predicates of C. it is a model of C, denoted ir \= C iff for each interpretation 
X of the exogenous predicates with ir(X) > and each interpretation J of the 
endogenous predicates, tt(J \ X) = irfA(J). 

If a CP-theory C has no exogenous predicates, then there is a unique n for which 
7r |= C and this is, of course, simply the distribution ttc- 

Having defined this formal semantics for CP-logic, it is natural to ask how the 
causal interpretation that we have informally attached to expressions in our lan- 
guage is reflected in it. We see that our semantics essentially consists of the following 
two causal principles: 

• The principle of universal causation states that all changes to the state of the 
domain must be triggered by a causal law whose precondition is satisfied. 

• The principle of sufficient causation states that if the precondition to a causal 
law is satisfied, then the event that it triggers must eventually happen. 

Together with our decision to use Shafcr's probability trees as our basic semanti- 
cal objects and the particular representation that we have chosen for probabilistic 
causal laws, these two principles essentially determine our logic completely — at 
least, in the positive case. In the following sections, we turn our attention to the 
question of how to extend our definitions to the case where negation can appear 
in the precondition of a CP-law. However, this requires us to first discuss in more 
detail a particular modeling methodology for CP-logic. 

4 Modeling more complex processes in CP-logic 

In the formal semantics of CP-logic, the interpretations X(s) associated to nodes s 
of the probability trees play an important role. Indeed, if we forget for a moment 
the restriction that each causal law can fire at most once in each branch, then the 
interpretation T(s) completely determines which of the causal laws can fire in s. 
In our account of CP-logic so far, we have suggested that the logical vocabulary 
X of a CP-theory be chosen in such a way that possible states of the domain of 
discourse correspond to (Herbrand) interpretations for S. However, this assumption 
restricts the kind of causal laws that can be represented in at least two different, 
but related, ways. First, it means that we can only describe events that are caused 
by some property of the current state s. In particular, it is not possible to say 
that an event is caused by something which happened previously, but no longer 
has any visible effect on the current state. Second, since the interpretations I(s) 
grow monotonically throughout a branch, i.e., no atoms are ever removed from 
such an interpretation, it also means that we actually cannot even describe events 
whose effects manifest themselves in some state, but then disappear again in a 
future state. The following example illustrates these limitations of the view that an 
interpretation T(s) represents precisely the state of the domain at node s. 
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Example 4 

In 10% of the cases, pneumonia causes permanent lung damage, which persists after 
the pneumonia itself has disappeared. Let us also assume that the probability of 
getting pneumonia is 0.3. One attempt to model this is as follows: 



The problem with this theory is that, under the natural interpretation of the pred- 
icates, it violates the assumptions made by CP-logic: after pneumonia has been 
caused and has in turn lead to permanent lung damage, it might go away again. 
As such, it is no surprise that, according to the formal semantics of this theory, the 
probability of a patient having permanent lung damage and no pneumonia is zero, 
while in reality, this situation is perfectly possible. 

There is however a simple solution to this problem — at least, if we are prepared to 
refine our informal interpretation of the atom Pneumonia. Instead of interpreting 
this atom as representing the real- world property that "the patient has pneumonia" , 
we can also interpret it as representing the property that "at some point in time, 
the patient has had pneumonia". It is obvious that this now is a property that, 
once initiated, will forever persist. The CP-law (fl2|) now reads as: "if the patient 
has, at some point, had pneumonia, then this causes him to have lung damage 
with probability 0.1." According to this reading, it is now only the case that it is 
impossible for a patient to have lung damage if he has not at some point in time 
had pneumonia, which is of course a conclusion that should follow from our problem 
statement. 

To fix this example, we had to subtly change the correspondence between the 
states of the formal execution model and the states of the real world: whereas 
previously, each of our formal states precisely matched one state of the real world, 
it is now the case that a formal state actually represents both the state of the world 
at some particular time and also certain information about the history of the world 
up to that time. Taking this idea further actually allows us to describe processes in 
considerable temporal detail, as the following example illustrates. 

Example 5 

A patient is admitted to hospital with pneumonia and stays there for a number of 
days. Each day, the pneumonia might cause him to suffer chest pain on that partic- 
ular day with probability 0.6. With probability 0.8, a patient who has pneumonia 
on one day still has pneumonia the next day. 

On the one hand, this example describes a progression through a sequence of 
days. On the other hand, it also describes events that takes place entirely during 
one particular day. In general, a process of this kind will look something like Figure 
[5] the global structure of the process is a succession between different time points 
and, at each particular time point, a local process might take place. 

The question now is how to model such a succession of states in CP-logic. A first 
important observation is that we now need to distinguish between the values of 



(LungDamage : 0.1) <— Pneumonia. 
(Pneumonia : 0.3). 
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Fig. 5. A global process as a sequence of local processes. 

properties at different time points, i.e., we can no longer represent every relevant 
property by a single ground atom, but instead we need a ground atom for every pair 
of such a property and a time point. Typically, one would construct a vocabulary 
by adding time as an argument to predicates, as is done in, e.g., the event calculus 
or situation calculus. For instance, to describe Example [5j we could construct a 
vocabulary which has the following ground atoms: 

• Referring to day 1: {Pneumonia(l) , Chestpain(l)}; 

• Referring to day 2: {Pneumonia(2) , Chestpain(2)}; 
«... 

• Referring to day n: {Pneumonia(n),C'hestpain(n)}; 

Of course, it might be equally possible to use some other representation, such 
as Pneumonia(SecondDay) or Pneumonial instead of Pneumonia(2) . With the 
above vocabulary, we can now model Example [5l We assume a fixed range l..n of 
days, to ensure finiteness of the grounded theory. 

Pneumonia(l) . (14) 
Wd (Pneumonia(d + 1) : 0.8) <— Pneumonia(d) . (15) 
Vd (Chestpain(d) : 0.6) <— Pneumonia(d) . (16) 

Here, the CP-laws described by (fi"5"|) are of the kind that propagate from one time 
point to a later time point, whereas (TIB)) describes a class of "instantaneous" events, 
taking place entirely inside of a single time point. Of course, whether a particular 
event is instantaneous depends greatly on which unit of time is being used: one can 
imagine that it makes a difference whether we measure time in seconds or in days. 

According to the informal description of Example [5l the intended model is the 
process shown in Figure [6l It can easily be seen that this is indeed an execution 
model of the above CP-theory. We remark that this theory also has other execution 
models, which do not respect the proper ordering of time points, such as, e.g., 
the process in which all events caused by (fT~5|) happen before those caused by 
(fi"6]) . However, since these "wrong" processes all generate the same probability 
distribution as the intended process anyway, this is harmless. 

We also observe that, again, the correspondence between the states of the execu- 
tion model and the states of the real world is less direct than it was in the examples 
of Section 13.21 Indeed, now, a state of an execution model contains a trace of the 
entire evolution of the real world until a certain point in time. As such, a leaf of 
the execution model now represents a complete history of the world, whereas in the 
examples of Section l3~2"l it only represented the final state of the process. 

Let us now make the above discussion more formal. We assume that, when con- 
structing the vocabulary E, we had in mind some function A from the Herbrand 
base of E to an interval [0..n] C N, such that, in our desired interpretation of this 
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Fig. 6. Initial segment of the intended model of Example O 

vocabulary, each atom p refers to the state of some property at time point X(p). 
We call such a function a timing and A(p) the time of atom p. In the typical case of 
predicates containing an explicit temporal argument, such a timing would simply 
map atoms onto this argument; for instance, in the case of the above example, we 
had in mind the following timing A: 

• For each ground atom Pneumonia(i) , X(Pneumonia(i)) = i: 

• For each ground atom Chestpain(i), X(Chestpain(i)) = i. 

If we now look again at the CP-laws we wrote for this example, we observe that, 
whenever there is an atom in the head of a CP-law r that refers to the truth of some 
property at time i and an atom in the body of r that refers to the truth of some 
property at time j, it is the case that i > j. This is of course not a coincidence. 
Indeed, because, in the real world, causes precede effects, it should be impossible 
that the cause-effect propagation described by a CP-law goes backwards in time. 
Note that it is also possible that i — j; in this case, the CP-law is instantaneous 
w.r.t. the granularity of time that is being used, i.e., it describes one of those 
events (such as (15) in Example 5) that takes place entirely within a single time 
point. Another, perhaps more illustrative, example of an instantaneous CP-law is 
the statement that an increase in the current flowing through a resistor causes 
an increase in the voltage drop across it. Here, the increased current conceptually 
precedes the increased voltage drop, but we would never expect to actually observe 
a temporal delay. 

Definition 8 [respecting a timing) 

Let E be a vocabulary. A CP-theory C respects a timing A iff, for every r G C, if 
h G headAt{r) and b S bodyAt(r), then X(h) > X(b). 
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Such a timing A also contains information about when events might happen. To 
be more concrete, if a CP-law r fires at time point i, then we would expect i to 
lie somewhere between the maximum of all A (6) for which b G body At (r), and the 
minimum of all X(h) for which h £ headAt(r). For a rule r, we write t\(r) to denote 
this interval, i.e., 

t\(r) = [ max A(6), min A(/i)]. 

bGbodyAt(r) h£headAt(i~) 

Now, if we are constructing a CP-theory with a particular timing A in mind, then 
the process we are trying to model should be such that every CP-law r that actually 
fires does so at some time point n(r) 6 t\(r). We will call such a mapping k from 
rule r G C to time points k(t) s t\(r) an event-timing of A. We remark that if a 
CP-law r is instantaneous, then the interval t\(r) will consist of a single time point 
and it is indeed clearly at this time point that the CP-law should fire. 

A timing A therefore imposes the following constraint on which processes can be 
considered reasonable. 

Definition 9 (following a timing) 

Let £ be a vocabulary with timing A and let C be a CP-theory that respects A. A 
probabilistic S-process T follows A if there exists an event-timing k of A such that 
the CP-laws of T fire in the order imposed by k, i.e., if for all successors s' of a 
node s, k(£(s')) > k(£(s)). 

It can now be shown that for any timing A and any CP-theory C respecting A, 
C will have an execution model that follows A. 

Theorem 3 

Let C be a CP-theory respecting a timing A. There exists an execution model T of 
C, such that T follows A. 

Proof 

Proof of this theorem can be found in Section IA.3I □ 

This result shows that if we construct a CP-theory C with a particular timing in 
mind, then C will have an execution model in which the events happen in precisely 
the order dictated by this timing. Therefore, the modeling methodology that we 
have suggested in this section is indeed valid. In the case of Example O the process 
shown in Figure [6] is an execution model that follows the timing A specified above. 

In the sequel, we will refer to CP-theories, for whose vocabulary we have some 
intended timing in mind, as temporal CP-theories; other CP-theories will be called 
atemporal. 

5 CP-logic with negation 

So far, we have only allowed positive formulas as preconditions of CP-laws. In this 
section, we examine whether it is possible to relax this requirement. We first look 
at a small example. 
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Example 6 

Having pneumonia causes a patient to receive treatment with probability 0.95. 
Untreated pneumonia causes fever with probability 0.7. 

{Fever : 0.7) <— Pneumonia A -^Treatment. (17) 
(Treatment : 0.95) <— Pneumonia. (18) 

Figure [7] shows two processes for this example that satisfy all the requirements 
that we previously imposed for positive theories. It is obvious, however, that in this 
case the final outcome is affected by the order in which events occur. So, simply 
including negation in this naive way would give rise to ambiguities, causing our 
desirable uniqueness property (Theorem [2]) to be lost. 

Giving up the uniqueness property would have grave consequences for the logic 
and its practical use. One radical solution to the problem might be to force the 
user to not only specify causal probabilistic events, but also information about the 
order in which these events can happen. However, such information is difficult to 
obtain and represent; moreover, in many cases, it would just be useless overhead — 
indeed, as we have already seen, one of the most interesting features of CP-logic 
without negation is precisely the fact that we can obtain a complete probability 
distribution without requiring such information. The solution that we will adopt 
instead is to restrict the class of processes associated to a CP-theory in such a way 
that the uniqueness property is preserved, i.e., all processes from this restricted 
class generate the same distribution over the final states. 

To introduce the additional constraint that will be imposed on execution mod- 
els, let us take a closer look at the above example. We observe that, in process 
|7(b)[ event ([17]) is caused at a moment when its precondition is not yet in its 
final state. In particular, when (fT?]) happens in the initial state, its precondtion 
-^Treatment holds, but later on, for instance in the leftmost branch, event (fT8|) 
causes Treatment, thereby falsifying this precondition. So, in the final state of this 
branch, we see that Fever holds, while the precondition of the CP-law that caused 
it no longer holds. 

In light of this discussion, we can now explain the additional assumption that 
CP-logic makes about the causal processes. This assumption, called the temporal 
precedence assumption, is that a CP-law r will not fire until its precondition is 
in its final state. More precisely, it cannot fire until the part of the process that 
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determines whether its precondition holds is fully finished. For Example [6l it is 



clear that only the process in Figure 7(a) satisfies this assumption, and so, in this 
case, the ambiguity has been resolved. 

We stress here that temporal precedence is nothing more than an assumption: 
inherently, there is nothing wrong with the causal process in Figure [7(b)] and we 
could in fact easily imagine that, because fever is one of the earliest symptoms of 
pneumonia, this process is actually a better model of the real world than that in 



Figure 7(a) So why do we choose to eliminate precisely these processes, in order 
to regain our uniqueness result? 

To explain our motivation for this, we need to go back to the analysis of Section 
|U There, we considered timed vocabularies, in which ground atoms are intended 
to represent properties at some particular point in time. We then proved that each 
temporal theory without negation has an execution model that follows its timing, 
i.e., in which events happen in the right order. As we remarked, such a theory may 
also have other execution models, in which events happen in the wrong order, but 
this is not a problem, because all execution models of a positive theory generate 
the same probability distribution anyway. For theories with negation, however, the 
situation is more complicated. In that case, we can have three different kinds of 
execution models: those which follow the timing; those which do not follow the 
timing, but nevertheless generate the same probability distribution as the ones 
that do; and those which do not follow the timing and also generate a different 
probability distribution. The right way to resolve the ambiguity for these theories 
is obviously to reject this last kind of execution model. 

As we will prove in Section 16.31 temporal precedence will do precisely this — at 
least, if the CP-laws containing negation are not instantaneous. Intuitively, this can 
be explained as follows. For such a non-instantaneous CP-law, the timings of the 
atoms in its precondition are strictly earlier than those of its effects. Therefore, 
in a process which follows this timing, all events which cause one of these atoms 
must happen before the CP-law itself fires. This is now precisely what temporal 
precedence assumes. The following example is a variant — or rather, a refinement — 
of Example El which illustrates this. 

Example 7 

A patient enters the hospital, possibly suffering from pneumonia. At this time, he 
will be examined by a physician, who will decide to treat the patient with probability 
0.95 if he actually has pneumonia. If the patient has pneumonia but the doctor does 
not treat him, there is a probability of 0.7 that the patient will exhibit a fever by 
the next morning. We introduce the following propositions: 

• Pneumonia: "the patient has pneumonia when entering the hospital" ; 

• Treatment: "the patient is treated upon admission" ; 

• Fever: "the patient has a fever the next morning". 

Under this interpretation of our vocabulary, the CP-theory of Example [6] respects 
the timing and correctly models the example. Clearly, the intended model of the 



theory is now that of Figure 7(a) in this model, CP-laws fire in the right temporal 
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order. The process of Figure [7(b)| on the other hand, goes against the flow of time 
("treatment upon admission" is only caused after "fever the next morning"), which 
should be impossible. 

So, as this example illustrates, if our CP-theory respects some intended timing 
such that the CP-laws containing negation are non-instantaneous, the temporal 
precedence assumption will resolve the ambiguity in the right way, i.e., by selecting 
precisely those processes that follow the intended timing. We will now formally 
define temporal precedence and prove afterwards, in Section [6. 31 that this property 
holds in general. 

We start by introducing some mathematical machinery. The basic idea is that 
a CP-law should only fire after all events that might still affect the truth of its 
precondition have already happened, i.e., this precondition should not merely be 
currently true, but should in fact already be guaranteed to also remain true in all 
potential future states. This naturally leads to a three-valued logic, where we have 
truth values t (guaranteed to remain true), f (guaranteed to remain false), and u 
(still subject to change). Recall that a three- valued interpretation v is a mapping 
from the ground atoms of our vocabulary to the set of truth values {t, f , u}, which 
induces for each formula <j> a truth value 4> v . 

Now, if our probabilistic process is in a state s, then the atoms of which we are 
already sure that they are true are precisely those in X(s). To figure out which 
atoms arc still unknown, we need to look at which CP-laws might still fire, i.e., at 
those rules r, for which body(r) u ^ f . Whenever we find such a rule, we know that 
the atoms in head(r) might all still be caused and, as such, they must be at least 
unknown. We will now look at a derivation sequence, in which we start by assuming 
that everything that is currently not t is f and then gradually build up the set of 
unknown atoms by applying this principle. 

Definition 10 (hypothetical derivation sequence) 

A hypothetical derivation sequence in a node s is a sequence (vi)o<i<n of three- 
valued interpretations that satisfied the following properties. Initially, Vq assigns f 
to all atoms not in I(s). For each i > 0, there must be a rule r with body(r) Vi ^ f , 
such that, for all p 6 headAtir) with fi(p) = f, it is the case that z^+i(p) = u, 
while for all other atoms p, Vi+i(j>) = Vi(p). 

Such a sequence is terminal if it cannot be extended. A crucial property is now 
that all such sequences reach the same limit. 

Theorem 4 

Every terminal hypothetical derivation sequence reaches the same limit, i.e., if 
(^i)o<i<n and (i^)o<i<ra ar e such sequences, then v n = v' m . 

Proof 

Proof of this theorem is given in Section lATl □ 

For a state s in a probabilistic process, we will denote this unique limit as v s 
and refer to it as the potential in s. Such a v s now provides us with an estimate 
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of which atoms might still be caused, given that we are already in state s. We can 
now tell whether the part of the process that determines the truth of a formula <j> 
has already finished by looking at v s ; indeed, we can consider this process to be 
finished iff (ff s ^ u. We now extend the concept of an execution model to arbitrary 
CP-theories as follows. 



Definition 11 [execution model — general case) 

Let C be a CP-theory in vocabulary S, T a probabilistic E-process, and X an 
interpretation of the exogenous predicates of C . T is an execution model of C in 
context X iff 

• T satisfies the conditions of Definition [5] (execution model-positive case) ; 

• For every node s, body(£(s)) u " ^ u, with v s the potential in s. 



From now on, we will refer to a probabilistic S-process that satisfies the original 
conditions of Definition [5l but not necessarily the additional condition imposed 
above, as a weak execution model. 

In the case of Example^ this definition indeed gives us the result described above, 



i.e., the process in Figure 7(a) is an execution model of the example, while the one 
in Figure |7(b)| is not. Indeed, if we look at the root _L of this tree, with 2(1) = 
{Pneumonia}, we see that we can construct the following terminal hypothetical 
derivation sequence: 



• vq assigns f to Treatment and Fever; 

• v\ assigns u to Treatment; 



vi assigns u to Fever, because (-^Treatment A Pneumonia)" 1 = u; 



As such, the only CP-law that can initially fire is the one by which the patient 
might receive treatment. Afterwards, in every descendant s of _L, v s (Treatment) 
will be either t or f. In the branch where it is f , the event by which the patient 
gets fever because of untreated pneumonia will subsequently happen. 

The temporal precedence assumption imposes a constraint on the order in which 
CP-laws can fire and hence, on the order in which atoms can be caused to become 
true. In the case of Example El this order is fixed and can easily be derived from the 
syntactical structure of the CP-theory. This is not always the case. As the following 
example illustrates, the order of events may depend on the context in which they 
happen. 



Example 8 

A software system consists of two servers that provide identical services. One server 
acts as master and the other as slave, and these roles are assigned randomly. Clients 
can request services. The master makes a selection among these request and the 
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slave fulfills the requests that are not accepted by the master. 



(Master{Sl ) : 0.5) V (Slave(Sl) : 0.5). 




(19) 


Master(S2) <- 


- Slave(Sl). 


(20) 


Slave(S2) *- 


- Master(Sl). 


(21) 


VxVs (Accept s(x, s) : 0.6) <- 


- Application^) A Master(x). 


(22) 


VxVs Accepts(x, s) <- 


- Application^) A Slave(x) A 


(23) 




Master(y) A ->Accepts(y, s). 





In all causal processes that satisfy the temporal precedence assumption, the master 
accepts services before the slave does. However, because who is slave and who is 
master depends on the result of event ([!§]) , this means that we cannot say up- 
front which of the atoms Accepts(Sl, s) and Accepts(S2, s) will be caused first. 
This shows that the temporal precedence assumption induces a context dependent 
stratification on both events and atoms. 

The temporal precedence assumption is correct for many theories — including, as 
we will prove later, all those temporal theories in which CP-laws containing negation 
are not instantaneous — but not for all. 

Example 9 

We consider a variant of the problem of Example [8] in which the slave does not have 
to wait for the decision of the master, but is allowed to accept any request provided 
it has not yet been accepted by the master. It is then possible that first the slave 
and later the master accept the same request, in which case the service is provided 
two times. 

Unlike Example this specification is an incomplete description of a probability 
distribution. Indeed, the probability of a request being handled by the slave now also 
depends on the probability of the slave reaching a decision before the master does, 
which is not specified. If we try to model this example with the same CP-theory 
as Example the temporal precedence assumption would make one particular 
assumption about the relative speed of the two servers, namely, that the slave 
is always slower than the master. If we want to model some other distribution, 
where the slave is sometimes faster than the master, we have to use a different 
representation style, which allows such information to be incorporated. This will 
be discussed later in Example [T3l 

What this discussion illustrates is that, ultimately, it is the responsibility of the 
user to design his CP-theory in such a way that the intended causal processes satisfy 
the temporal precedence assumption. 

6 Discussion 

We now check whether the way in which the previous section has extended the 
concept of an execution model to cope with negation indeed satisfies the goals that 
we originally stated. 
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6. 1 The case of positive theories 

First of all, we remark that, for positive CP-theories, the new definition (Def. [TTj) 
simply coincides with the original one (Def. i.e., for positive theories, there is no 
difference between execution models and weak execution models. Indeed, because, 
according to our original definition, it must be the case that T(s) (= £(s) for each 
non-leaf node s, this is an immediate consequence of the following theorem, which 
follows trivially from the fact that throughout a hypothetical derivation sequence, 
the truth of an atom p can only increase. 

Theorem 5 

Let s be a node in a probabilistic S-process. For any positive formula cj), if T(s) \= <f>, 
then v s (4>) = t. 

We conclude that, for positive CP-theories, the new definition is simply equivalent 
to the old one. 



6.2 Uniqueness theorem regained 

Second, the uniqueness theorem now indeed extends beyond positive theories. 
Theorem 6 [Uniqueness — general case) 

Let C be a CP-theory and X an interpretation of the exogenous predicates of C. 
If T and T' are execution models of C in context X, i.e., T \=x C and T \=x C, 
then itq- = ttt' ■ 

Proof 

Proof of this theorem is given in Section [PI □ 



6.3 Correctness of temporal precedence in temporal CP-theories 

In the previous section, we introduced the temporal precedence assumption as a 
way of solving an ambiguity problem, namely the fact that different weak execution 
models of a CP-theory with negation might produce different probability distribu- 
tions. We showed that this assumption was satisfied in the temporal CP-theory of 
Example [7J We now prove that the temporal precedence assumption is satisfied in 
a broad class of CP-theories, namely, in all temporal CP-theories in which events 
containing negation are non-instantaneous. To be more concrete, we will show that 
if a weak execution model follows the timing of the vocabulary, it also satisfies the 
temporal precedence assumption. 

Our first step is to refine our notion of a theory respecting a timing (Definition 
[5]), to make a distinction between those atoms from some body At (f) that appear 
only in a positive context and those which occur at least once in a negative context. 
The set of all the latter atoms will be denoted as body^ t (r) , whereas that of all the 
former ones is body\ t (r jo. 



4 Formally, we define, for all sentences <f>, the sets At+ (</>) and At (0) by simultaneous induction 
as: 
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Definition 12 {strictly respecting a timing) 

A CP-theory C (with negation) strictly respects a timing A if, for all ground atoms 
h and b: 

• If there is CP-law r with h £ headAt(r) and b e body\ t (r), then X(h) > X(b); 

• If there is CP-law r with h £ headAt(r) and & £ body^ t (r), then A(/i) > A(6); 

Notice that we impose a stronger condition on negative conditions than on positive 
ones: the times of negative conditions should be strictly less than any caused atom. 
This condition entails that CP-laws with negation are not instantaneous. 

Theorem 7 

Let C be a CP-theory which strictly respects a timing A. Every weak execution 
model of C that follows A also satisfies temporal precedence and is, therefore, an 
execution model of C . Moreover, such a process always exists. 



Proof 

Proof of this theorem is given in Section lA~3l □ 

Intuitively, the theorem states that any causal process of C that is physically 
possible (i.e., in which no event is caused by conditions that arise only in the future), 
automatically satisfies temporal precedence. Hence, in the context of CP-theories 
that strictly respect some intended timing, the temporal precedence assumption 
applies naturally. 

Example 10 

We consider a time line divided into a number of different time slots, as illustrated 
in Figure [H In the first time slot, a client sends a request to a server. If the server 
receives a request, then with probability 0.5, he accepts it and sends a reply, all 
within the same time slot as that in which he received the request. If the client 
has sent a request and has not received a reply at the end of the time slot, he will 
repeat his request. A message that is sent has a probability of 0.8 of reaching the 
recipient in the same time slot as it was sent; with probability 0.1, it reaches the 
recipient only in the next slot; with the remaining probability of 0.1, it will be lost. 



• For p(t) a ground atom, At~ (p(t)) = {} and At+(p(t)) = {p(t)}| 

• For <j> o ip, with o either V or A, At+ (<f> o ip) = At+ (<f>) U At+ (ip) and At~ (0 o ip) = 
At~((f>) U At~(ip); 

• For -if, At+(-n<p) = At-(tf>) and At~(-n(t>) = At+{<f>); 

• For Qx (j>, with 6 either V or 3, At+(0x <f>) = U teHu ^At+ (ej>[x/t]) and At~ (0x cj>) = 
UtgH^E) At~ (4>[x/i\), where Hjj{S) is the Herbrand universe. 

We can then define body~^ t (r) = At~ (body(r)) and body~^ t {r) = bodyAt(r) \ body^ t (r). 
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Fig. 8. A division into time slots. 



{Send(Client, Req, Server, 1) : 0.7). (24) 
Vf (Accept(t) : 0.5) V (Reject(t) : 0.5) <- Recvs{Server, Req, t). (25) 
Vi Send(Server, Answer, Client, t) <— Accept(t). (26) 
Vt, s, r, to (Recvs(r, m, t) : 0.8)V(Recvs(r, m, t + 1) : 0.1) 

<— Send(s, m, r, t). 
\/t Send{Client, Req, Server, t + 1) <— Send(Client, Req, Server, t) 

A ->Recvs(Client, Answer, t). 



(27) 
(28) 



In this CP-theory, (|24|) . (f25|) and (|26|) are all instantaneous; the events described 
by (127|) might either take place within one time slot or constitute a propagation to 
a later time slot, depending on which of the possible effects actually occurs; finally, 
the events described by (|2"5|) all propagate to a later time slot. Because these last 
events are the only ones in which negation occurs, this theory strictly respects its 
intended timing and the theorem shows that the semantics gives the intended result. 

In summary, a sensible temporal CP-theory should respect its timing. If it strictly 
respect this timing — that is, the timing is fine-grained enough to make CP-laws with 
negation non-instantaneous — then all of its weak execution models will automati- 
cally satisfy temporal precedence as well. Otherwise, there may be weak execution 
models that do not satisfy temporal precedence and, hence, will be ruled out as 
well. 



6.4 Validity of a CP-theory 

Not all CP-theories have an execution model. Let us illustrate this by the following 
example. 

Example 11 

A game is being played between two players, called White and Black. If White 
does not win, this causes Black to win and if Black does not win, this causes White 
to win. 

Win{White) <- -nWin(Black). (29) 
Win(Black) <- -AW in{W hite) . (30) 

This theory has two weak execution models: one in which |29|) fires first and white 
wins with probability 1, and one in which (|30|) fires first and black wins with 
probability 1. However, both of these weak execution models are rejected by the 
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temporal precedence assumption. Indeed, in each of these weak execution models, 
it is the case that, for the root _L, (-^Win(White)) u± = u = (^Win(Black)) v± , so 
neither of the two events can happen. So, this is an example of an ambiguity that 
cannot be resolved by assuming temporal precedence. In order to make a sensible 
CP-theory out of this example, we would have to add additional information about 
the probability that one CP-law fires before the other. As will be illustrated in 
Example 1 13i such information can be modeled in CP-logic, but requires a different 
representation style in which temporal arguments are added to the predicates. 

Theories which have no execution models are obviously not of interest. This 
motivates the following definition. 

Definition 13 (valid CP-theories) 

A CP-theory C is valid in an interpretation X for its exogenous predicates if it has 
at least one execution model in context X . If C is valid in all contexts X , we simply 
say that C is valid. 

Clearly, it is only if C is a valid CP-theory that we can associate a probability 
distribution ttc to it. The theories of Example [6] and Example [8] are valid. 

The above discussion raises the question how to recognize whether a theory is 
valid. We now propose a simple syntactic criterion that guarantees this. 

Definition 14 (stratified CP-theories) 

A CP-theory C is stratified if there exists a function A from the set of its atoms to 
an interval [0..n] such that C strictly respects A. 

Here, it is possible that the function A is a timing such as in Section \6l3\ but this 
is not necessary; e.g., it might be the case that A assigns different natural numbers 
to atoms that conceptually, in their intended interpretation, are supposed to refer 
to the same time points. The following corollary of Theorem [7] is of relevance both 
to temporal and atemporal CP-theories. 

Corollary 1 

Each stratified theory C has an execution model. 

We remark that, in particular, all positive theories are stratified, because, for 
such a theory, we can simply assign to all ground atoms. An example of a strat- 
ified theory containing negation is given in Example 1101 The theory of Example 
[8] is not stratified because the atoms Accepts(Sl, x) and Accepts(S2, x) cannot be 
ordered in time, since the times at which they are made true depends on who is the 
master. This is an example of a valid but unstratified CP-theory. We therefore con- 
clude that the existence of a stratification is a sufficient condition for the existence 
of an execution model — and hence of the theory actually defining a probability 
distribution — but not a necessary one. 

6.5 The representation of time in CP-logic 

In the preceding sections, we have encountered two quite different styles of know- 
ledge representation: temporal theories explicitly include time, while atemporal 
theories make abstraction of it. 
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There may be several reasons for making time explicit. One obvious reason is if 
we are actually interested in the intermediate states of the process. Other reasons 
might be that the causal processes in a domain are simply too complex to model 
without explicit time. Below, we illustrate two such cases. 

In CP-logic, each atom starts out as false and might become true during the 
process; moreover, if at some point an atom becomes true, it will remain true. In 
applications where the obvious relevant properties of the domain of interest do not 
behave like this, we cannot simply represent them by atoms in our CP-theory. As 
already mentioned in Section [4j this problem can typically be solved by explicitly 
including time in the representation. The following example illustrates how this 
methodology can be used to handle domains in which there are causes for both a 
property and its negation. 

Example 12 

Consider the following variant of Example [21 in which a doctor can now administer 



a medicine to suppress chest pain with probability 0.9. 

Pneumonia(l) . (31) 

Vd (Pneumonia(d + 1) : 0.8) <— Pneumonia(d) . (32) 

Vd (Chestpain(d) : 0.6) <— Pneumonia(d) A ->Suppressed(d) . (33) 

Vd Medicineid) <— C'hestpain(d). (34) 

Vd (Suppressed(d + 1) : 0.9) <— Medicine(d). (35) 



In this representation, the use of negation allows the predicate Suppressed to act 
as a cause for not having chestpain. 

We now discuss another type of application that requires time to be made ex- 
plicit. As mentioned before, temporal precedence might give unintended results for 
theories which are not temporal or whose granularity of time is such that nega- 
tion occurs in instantaneous events. In such cases, the obvious solution is to make 
time explicit and ensure it is fine-grained enough to make all events with negation 
non-instantaneou^E To illustrate, we consider the following refinement of Example 

El 

Example 13 

We again consider the setting of Example El where the slave does not necessarily 
wait for the decision of the master, before deciding whether to accept the request 
himself. This might be the case, for instance, in a system where the two servers have 
not been properly synchronized. As we explained, for such a model to be a complete 

5 For most real world events, there exists, at least in principle, some time scale that would make 
them non-instantaneous. For instance, even an event such as the temperature of a gas increasing 
when the space in which it is contained decreases, only manifests itself after the molecules of 
the gas have travelled a certain microscopic distance, which does take a — small, but in principle 
non-zero — amount of time. Examples of truly instantaneous events can be found in quantum 
mechanics (if the state of one object collapses, this instantaneously causes the collapse of the 
state of each entangled object) and abstract properties defined by social convention (e.g., signing 
a purchase deed instantaneously makes one the owner of a house). 
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description of a probability distribution, we then also need to include information 
about the probability that the slave decides before the master. We will assume that, 
at each time point where the master has not decided yet, there is a probability of 
0.2 that he will decide; for the slave, we assume that this probability is 0.8. 

(Master{Sl) : 0.5) V (Slave(Sl) : 0.5) <- . 
Master{S2) <- Slave(Sl). 
Slave(S2) <- Master(Sl). 

Vx\/sVt(Decides(x, s, t) : 0.2) <— Masterix) A Application's) A 

-Bt' (f < t A Decides(x, s, t')). 

VxV 'sVt ( Accept s(x, s, t) : 0.6) <— M aster(x) A Decides{x, s, t). 

\/x\/sVt(Decides(x, s, t) : 0.8) <— Slave(x) A Application's) A 

-.3t' (f < t A Decides(x, s, t')). 

VxVsVt Accept s(x, s, t) <— Slave(x) A Decides(x, s, t)A 

-^3y3t'(M aster (y) A f ' < t A Accepts(y, s, if)). 

In this CP-theory, we have introduced the predicate Decides(x, s,t) as a reifica- 
tion of the events by which the servers reach their decision (i.e., the events that were 
described by (|22|) and (|23|) in our original theory from Example [8j . The meaning 
of this predicate is that server cc makes his decision on application s at time t. The 
above CP-theory models the situation of an eager slave that decides on applications 
much faster than the master, which causes many services to be provided twice. 

7 The relation to Bayesian networks 

In this section, we investigate the relation between CP-logic and Bayesian networks. 
Before we begin, let us briefly recall the definition of a Bayesian network. Such a 
network consists of a directed acyclic graph and a number of probability tables. 
Every node n in the graph represents a random variable, which has some domain 
dom(n) of possible values. A network B defines a unique probability distribution ttb 
over the set of all possible assignments n\ — v\ , . . . , n m = v m of values to all of these 
random variables, with all Vi G dom(ni). First, this ttb must obey a probabilistic 
independence assumption expressed by the graph, namely, that every node n is 
probabilistically independent of all of its non-descendants, given its parents. This 
allows the probability ttb(jii = v%, . . . ,n m = v m ) of such an assignment of values 
to all random variables to be rewritten as a product of conditional probabilities 
Yii ^B^Jii = Vi | pa(ni) — v), where each pairtA is the tuple of all parents of ni in 
the graph. The probability tables associated to the network now specify precisely 
all of these conditional probabilities 7Tb(tj., = Vi \ pa(ni) = v). The second condition 
imposed on ttb is then simply that all of these conditional probabilities must match 
the corresponding entries in these tables. It can be shown that this indeed suffices 
to uniquely characterize a single distribution. 

Most commonly, Bayesian networks are constructed without any explicit refer- 
ences to time, since this tends to produce the simplest models. However, in some 
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cases such a representation does not suffice; then, one typically uses a so-called 
dynamic Bayesian network (Ghahramani 1998) which makes time explicit in much 
the same way as can be done in CP-logic. 

Like CP-logic, Bayesian networks are a formal language that can be used to 
represent causal relations in a domain. This is done by chosing as the parents of a 
node x all nodes y for which it is the case that the value of y has a direct effect on 
the value of x. The values in the conditional probability table for x then quantifiy 
the joint effect that all of these parents together have on x. Bayesian networks 
constructed in this way are usually called causal networks, to dinstinguish them 
from "non-causal" networks which do not necessarily follow the direction of causal 
relations. Causal Bayesian network are more informative than non-causal ones: 
not only do they define a probability distribution, but they also specify what will 
happen when external action intervenes with the normal operation of the causal 
mechanisms it describes (jPearl 2 000 ). 

In this section, we will compare causal Bayesian networks to CP-logic. We first 
show that, because Bayesian networks can easily be unfolded into probability trees 
(|Shafer 1996P . they can be mapped to CP-logic in a straightforward way. We then 
discuss how CP-logic differs from Bayesian networks. There are essentially three 
main differences. Our representation is more fine-grained and modular in the sense 
that a single probabilistic causal law can express the effect that some of the "par- 
ents" of an atom have on it, regardless of the effect of others. It is also more 
qualitative, since we can use first-order formulas to specify in which circumstances 
the "parents" will have a certain effect on the child, while Bayesian networks encode 
such information in probability tables. Finally, it is also more general, in the sense 
that it can directly represent cyclic causal relations, which a Bayesian network can- 
not. We remark that these comparisons consider only the "vanilla" way of writing 
down Bayesian networks, i.e., as a drawing of a directed acyclic graph accompanied 
by tables of numbers. A large number of alternative notations exist in the litera- 
ture, e.g., ( |Comley and Dowe 2003 ). These provide more elegant ways of handling 
all but one of the "shortcomings" of Bayesian networks that we will mention — the 
exception being, to the best of our knowledge, their inability to directly represent 
cyclic causal relations. 

After this discussion of representation issues, Section[7T5]will discuss interventions 
in causal Bayesian networks and show that the semantics of CP-logic induces a 
natural counterpart to this notion. 



7.1 Bayesian networks in CP-logic 

As also mentioned in Shafer's book, a Bayesian network can be seen as a description 
of a class of probability trees. We first make this more precise. To make it easier to 
compare to CP-logic later on, we will start by introducing a logical vocabulary for 
describing a Bayesian network. 

Definition 15 

Let B be a Bayesian network. The vocabulary Y.b consists of a predicate symbol 
P n for each node n of B and a constant C v for each value v in the domain of n. 
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Fig. 9. Bayesian network for the sprinkler example. 
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{Sp, Rain, Wet} {Sp,Rain} {Rain,Wet} {Wet} {Sp, Wet} {Sp} {Wet} {} 
Fig. 10. Process corresponding to the sprinkler Bayesian network. 



Now, we want to relate a Bayesian network B to a class of E^-processes. Intu- 
itively, we are interested in those processes, where the flow of events follows the 
structure of the graph and every event propagates the values of the parents of a 
node to this node itself. We illustrate this by the following famous example. 

Example H (Sprinkler) 

The grass can be wet because it has rained or because the sprinkler was on. The 
probability of the sprinkler causing the grass to be wet is 0.8; the probability of 
rain causing the grass to be wet is 0.9; and the probability of the grass being wet 
if both the sprinkler is on and it is raining is 0.99. The a priori probability of rain 
is 0.4 and that of the sprinkler having been on is 0.2. 

The Bayesian network formalization of this example can be seen in Figure[9l Fig- 
ure [10] shows a process that corresponds to this network. Here, we have exploited 
the fact that all random variables of the Bayesian network are boolean, by repre- 
senting every random variable by a single atom, i.e., writing for instance Wet and 
-Wet instead of Wet(True) and Wet(False). Formally, we define the following 
class of processes for a Bayesian network. 

Definition 16 

Let B be a Bayesian network. A B -process is a probabilistic E^-process T for which 
there exists a mapping J\f from nodes of T to nodes of B, such that the following 
conditions are satisfied. For every branch of T, TV is a one-to-one mapping between 
the nodes on this branch and the nodes of B, which is order preserving, in the sense 
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that, for all s, s' on this branch, if JV(s) is an ancestor of Af(s') in B, then s must 
be an ancestor of s' in T. If Af(s) is a node n with domain {v\, . . . , Vk} and parents 
Pi . . . ,p m in B, then the children of s in T are nodes s\, . . . , Sk, for which: 

• l( Si )=l(s)U{P n (C Vi )}; 

• The edge from s to Si is labeled with the entry in the table for n, that gives 
the conditional probability of n = Vi given pi — wi, . . . ,p m — w m , where each 
Wi is the unique value from the domain of pi for which P Pi (C Wi ) G I(s). 

It should be clear that every leaf s of such a B-process T describes an assignment 
of values to all nodes of B, i.e., every node n is assigned the unique value v for 
which P n (c v ) G 2T( S )- Moreover, the probability V(s) of such a leaf is precisely 
the product of all the appropriate entries in the various conditional probability 
distributions. Therefore, the distribution itq- coincides with the distribution defined 
by the network B. 

We now construct a CP-theory CPb, such that the execution models of CPb 
will be precisely all B-processes. We first illustrate this process by showing how the 
Bayesian network in Figure O can be transformed into a CP-theory. 

Example 14 (Sprinkler — cont'd) 

We can derive the following CP-theory from the Bayesian network in Figure 

(Wet : 0.99) <— Sprinkler A Rain 
(Wet : 0.8) <— Sprinkler A ^Rain. 
(Wet : 0.9) <— -^Sprinkler A Rain. 
(Wet : 0.0) *— -^Sprinkler A ^Rain. 
(Sprinkler : 0.2). 
(Rain : 0.4). 



Again, this example exploits the fact that the random variables are all boolean, by 
using the more readable representation of Wet and -Wet instead of Wet(True) 
and Wet(False). It should be obvious that the process in Figure [TOl is an execution 
model of this theory and, therefore, that this theory defines the same probability 
distribution as the Bayesian network. 

It is now easy to see that the encoding used in the above example generalizes. 
Concretely, for every node n with parents pi, . ■ ■ ,p m and domain {i>i, . . . , Vk}, we 
should construct the set of all rules of the form: 

(P n (C Vl ) : ax) V • • • V (P„(C Vk ) : a k ) «- P Pl (C Wl ) A • ■■P Pm (C w J, 

where each Wi belongs to the domain of pi and each aj is the entry for n = Vj, 
given pi =«*!,... ,p rn = w m in the CPT for n. Let us denote the CP-theory thus 
constructed by CPb- The following result is then obvious. 

Theorem 8 
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Let B be a Bayesian network. Every i?-process T is an execution model of the 
CP-theory CPb, i-e-, T \= CPb- Therefore, the semantics of B coincides with the 
distribution ire- 

This result shows that CP-logic offers a straightforward way of representing 
Bayesian networks. We now discuss three ways in which it offers more expressivity. 

7.2 Multiple causes for the same effect 

In a process corresponding to a Bayesian network, the value of each random variable 
is determined by a single event. CP-logic, on the other hand, allows multiple events 
to affect the same property. This leads to better representations for effects that have 
a number of independent causes. Let us illustrate this by the following example. 

Example 15 

We consider a game of Russian roulette that is being played with two guns, one in 
the player's left hand and one in his right, each of which has a bullet in one of its 
six chambers. 

{Death : 1/6) ^— PullJtrigger(Left_gun). 
[Death : 1/6) <— Pull Jrigger (Right _gun). 

In this example, there are two "causal mechanisms" that might lead to Death: 
one is the fact that pulling the trigger of the left gun might cause a bullet to hit the 
person's left temple, and the other is the fact that pulling the trigger of the right gun 
might cause a bullet to hit the person's right temple. They are independent in the 
following sense: once we know how many and which of these mechanisms are actually 
activated (i.e., which of the two triggers are pulled), then observing whether one of 
these possible causes actually results in the effect (i.e., whether one of the bullets is 
actually fired and kills the persons) provides no information about whether one of 
the other causes will cause the effect (i.e., whether one of the other bullets is also 
fired). Mathematically, this is of course saying nothing more than the probability 
of the effect occurring should be equal to the result of applying a no«sj/-o)@ to the 
multiset of the probabilities with which each of the causal mechanisms that are 
actually activated causes the effect, i.e., if both guns are fired, the probability of 
death should be 1 — (1 — 1/6) 2 . This independence is precisely the condition that 
is required in order to be able to represent each of these two causal mechanism by 
a separate CP-law, as in the above example. To succinctly describe this situation, 
we say that PullJtrigger(Left_gun) and PullJbrigger(Right_gun) are independent 
causes for Death. 

For instance, in Example 14, Rain and Sprinkler were not independent causes 
for Wet, since the probability of Wet given both Rain and Sprinkler is 0.99, which 
is not equal to 1 — (1 — 0.8) (1 — 0.9) = 0.98. In this case, the causal mechanisms 

6 The noisy-or maps a multiset of probabilities oti to 1 — — Qi). 
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Fig. 11. A Bayesian network for Example 1151 . 



by which Rain and Sprinkler cause Wet therefore appear to reenforce each other: 
it might be that a light drizzle only causes the grass to get slightly moist, and 
that sometimes the pressure on the water main is so low that the sprinkler by 
itself cannot get the grass really wet, but that a light drizzle and a lightly spraying 
sprinkler together would be enough to cause Wet, even though neither of them 
separately would do the trick. Because Sprinkler and Rain are not independent 
causes in this example, we cannot use a representation of the form: 

(Wet : a) <— Sprinkler. 
(Wet : P) <- Rain. 

and instead have no choice but to use the representation shown on page 1321 

Figure [Til shows a Bayesian network for the Russian roulette example. The most 
obvious difference between this representation and ours concerns the independence 
between the two different causes for death. In the CP-theory, this independence 
is expressed by the structure of the theory, whereas in the Bayesian network, it is 
a numerical property of the probabilities in the conditional probability table for 
Death. Because of this, the CP-theory is more elaboration tolerant, since adding or 
removing an additional cause for Death simply corresponds to adding or removing 
a single CP-law. Moreover, its representation is also more compact, requiring, in 
general, only n probabilities for n independent causes, instead of the 2™ entries that 
are needed in a Bayesian network table. Of course, these entries are nothing more 
than the result of applying a noisy- orQ to the multiset of the probabilities with 
which each of the causes that are present actually causes the effect. 

In graphical modeling, it is common to consider variants of Bayesian networks, 
that use more sophisticated representations of the required conditional probability 
distributions than a simple table. Including the noisy-or as a structural element 
in such a representation achieves the same effect as CP-logic when it comes to 
representing independent causes. 



The noisy-or maps a multiset of probabilities ai to 1 — FIi(l — a i)- 
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7.3 First-order logic representation of causes 

In CP-logic, the cause of an event can be represented in a qualitative way, by 
means of a first-order formula. Bayesian networks, on the other hand, encode such 
information in the probability tables. 

Example 16 

In the so-called Wumpus world, an agent moves through a grid, which contains, 
among other things, a number of bottomless pits. One aspect of this world is that 
if a position x is next to such a pit, then with a certain probability a, a breeze 
can be felt there (often, a is simply taken to be 1). In CP-logic, we could write the 
following CP-law: 

Vx {Breeze{x) : a) <— 3y NextTo(x, y) A Pit(y). 

For a grid in which square A is surrounded by squares B,C, D and E, a Bayesian 
network could represent the effect of Pit(B), Pit(C), Pit(D), Pit(E) on Breeze(A) 
by the following table: 





Breeze(A) 


Pit(B) Pit(C) Pit(D) Pit(E) 


a 


Pit(B) Pit(C) Pit(D) ->Pit(E) 


a 






-^Pit(B) -^Pit(C) ->Pit(D) Pit(E) 


a 


-^Pit(B) ->Pit(C) ->Pit(D) -^Pit(E) 






In this example, CP-logic offers a representation which is considerably more con- 
cise than that of the Bayesian network. This manifests itself in two ways: first, our 
first-order representation succeeds in defining the probability of Breeze(x) for all 
squares x simultaneously by a single CP-law, while each square would need its own 
(identical) probability table in the Bayesian network; second, it can also summarize 
the 2 entries that make up the probability table for each Breeze(x) by the sin- 
gle first-order precondition of this CP-law. Again, these shortcomings have already 
been recognized by the Bayesian network community, leading to, for instance, the 
use of decision trees to represent probability tables ( |Comley and Dowe 2003D , var- 
ious forms of parameter tying and first-order versions of Bayesian networks such as 
Bayesian Logic Programs (Kcrsting and De Raedt 2000) (see also Section l9.4.3|) . 

We remark that this feature of CP-logic cannot really be seen separately from 
that discussed in the previous section: it is precisly because we split up the effect 
that "parents" have on their "child" into a number of independent causal laws, that 
we get more opportunity to exploit the expressivity of our first-order representation. 

7-4 Cyclic causal relations 

In real life, probabilistic processes may consist of events that might propagate values 
in opposite directions. We already saw this in Example [2j where angina could cause 
pneumonia, but, vice versa, pneumonia could also cause angina. In CP-logic, such 
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| external(angina)j > ^angina j 
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Fig. 12. Bayesian network for the angina-pneumonia causal loop. 



causal loops do not require any special treatment. For instance, the loop formed by 
the two CP-laws 

(Angina : 0.2) <— Pneumonia. 
(Pneumonia : 0.3) <— Angina. 

correctly behaves as follows: 

• If the patient has neither angina nor pneumonia by an external cause ('exter- 
nal' here does not mean exogenous, but simply that this cause is not part of 
the causal loop), then he will have neither; 

• If the patient has angina by an external cause, then with probability 0.3 he 
will also have pneumonia; 

• If the patient has pneumonia by an external cause, then with probability 0.2 
he will also have angina; 

• If the patient has both pneumonia and angina by an external cause, then he 
will obviously have both. 

In order to get the same behaviour in a Bayesian network, this would have to be 
explicitly encoded. For instance, one could introduce new, artificial random vari- 
ables external (angina) and external(pneumonia) to represent the possibility that 
angina and pneumonia result from an external cause and construct the Bayesian 
network that is shown in Figure [121 In general, to encode a causal loop formed 
by n properties, one would introduce n additional nodes, i.e., all of the n original 
properties would have the same n artificial nodes as parents. 



7.5 Inverventions in CP-logic 

Pearl's work investigates the behaviour of causal models in the presence of in- 
terventions, i.e., outside manipulations that preempt the normal behaviour of the 
system. His key observation here is that causal relations are robust, in the sense 
that, even when some causal relations are intervened with, the other causal rela- 
tions will still hold as before. Formally, an intervention for Pearl is something of the 
form do(X = x) that sets the value of a random variable X to x. In doing this, all 
edges leading into X are removed, because even though they represent the causal 
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mechanism that would normally determine the value of X, they become irrelevant 
when the value of X is determined by an external intervention instead. 

It is also possible to consider such interventions in the context of CP-logic. Our 
representation of a causal system is a modular one, in which the atomic unit is 
a single CP-law. Because of this, our language comes with a specific kind of in- 
terventions "built-in": if we want to know what the result is of intervening with 
a single causal law r, we can simply consider the theory from which this one law 
is removed (and possibly replaced by some other law). So, to judge the effect of 
doing an intervention that prevents r, we simply have to look at nc\{r) instead 
of ttc, in (roughly) the same way that Pearl would look at something of the form 
P(- | do(X = x)) instead of P(-). The opposite is of course also possible: if we want 
to know the effect of doing an intervention that instead establishes an additional 
causal law r, we could look at wcu{r}- 

To illustrate, let us consider another medical example. A tumor in a patient's 
kidney might cause kidney failure, which might cause the death of the patient; 
however, to make matters even worse, the tumor can also metastasize to the brain, 
which might also, independently, kill the patient. We can represent this as: 



Now, let us suppose that we want to know what the effect will be of putting the 
patient on a dialysis machine, which allows him to survive kidney failure. To anwer 
this question, we simply remove the last of these causal laws (since the dialysis is 
precisely meant to prevent this particular causality from taking effect) and look at 
the semantics of the resulting theory. If, say, the dialysis also carries some small 
risk, we can also add new causal laws such as: 



In this way, the semantics of CP-logic already carries within it a notion of in- 
tervention, which is also slightly more finegrained than that of Pearl, since we can 
consider interventions that prohibit a single causal law (as in the above example), 
whereas Pearl only considers interventions that prohibit all causal laws that affect 
the value of a certain random variables. In the case of the above example, therefore, 
we would have to either intervene to prevent all possible causes for death, including 
the brain tumor, or none at all. Admittedly, it is of course easy enough to solve 
this problem by introducing some intermediate variable, say "high levels of toxins 
in the blood" , between kidney failure and death. 

Among other things, Pearl uses interventions to make sense of the statistical 
phenomenon known as Simpson's paradox. Because this is somewhat of a benchmark 
for causal formalisms, we will briefly discuss how CP-logic can deal with it. 



(Kidney Failure 
(BrainTumor 
(Death 
(Death 



0.1) <— KidneyTumor. 
0.1) <— KidneyTumor. 
0.5) <— BrainTumor. 
0.9) <— KidneyF ailure 



(Death : 0.01) <— Dialysis. 
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Simpson's paradox refers to the phenomenon that a population can sometimes 
be partitioned in such a way that a certain outcome has a low probability in each of 
the partitioning sets, yet has a high probability in the population over all (or vice 
versa). For instance, a certain drug might be harmful to both men and women, but 
appear beneficial for persons of unknown sex. More precisely, taking the drug and 
recovering might be positively correlated in the population at large, but become 
negatively correlated when conditioning on sex. This can happen, for instance, 
if men are both more likely to take the drug and to spontaneously recover from 
the disease. (Because then observing that a patient takes the drug increases the 
probability that he is male, which in turn increases the probability that he will 
recover on his own, thus erroneously suggesting that the drug had some bencfical 
effects on this recovery.) 

The crux of Simpson's paradox is that the same probability distribution can be 
generated by different sets of causal laws, and that in order to figure out whether, 
e.g., some drug has a positive effect on a patient's condition, it are really these causal 
laws that matter. Pearl's book shows that the paradox can therefore be resolved by 
considering the causal models behind the probability distribution. In CP-logic, we 
can do the same. Let us illustrate this by a famous real- world example: it was found 
that women had a significantly lower acceptance rate than men for the graduate 
school of the University of California at Berkeley, which led to a discrimination 
law-suit against the university. However, it turned out that none of the individual 
departments of the university had a lower acceptance rate for women than for men; 
instead, it was simply the case that women were significantly more likely to apply 
to departments with a low acceptance rate. 

A highly simplified model of the real situation might therefore have looked some- 
thing like this: 



Vx ( Apply (x, Engineering) : 0.7) V (Apply '(x, Literature) : 0.3) <— Man(x). 
Vx (Apply (x, Engineering) : 0.2) V (Apply (x , Literature) : 0.8) <— Woman(x). 

Vx (Accepted(x) : 0.6) <— Apply (x , Engineering) . 

Vx (Accepted(x) : 0.3) <— Apply (x, Literature). 



Here, there clearly is no gender discrimination: the gender of the applicant plays 
no role in the CP-laws that describe how the university decides whether to accept 
an application. The reason for the law suit is that, if we only look at the acceptance 
rates of men and women for the university as a whole, we cannot distinguish this 
CP-theory from, e.g., the following one: 
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Vx (Apply (x, Engineering) : 0.6) V (Apply (x, Literature) : 0.5) <— Man(x). 
Vx (Apply (x, Engineering) : 0.4) V (Apply r (x, Literature) : 0.6) <— Woman(x). 

Vx (Accepted(x) : 0.3) <— Apply (x, Engineering) A Woman(x). 

Vx (Accepted(x) : 0.6) «— Apply (x, Engineering) AMan(x). 

Vx (Accepted(x) : 0.4) <— Apply (x, Literature) A Woman(x). 

Vx (Accepted(x) : 0.5) <— Apply (x, Literature) AMan(x). 

Indeed, both theories yield an acceptance rate of 0.36 for women and an ac- 
ceptance rate of 0.51 for men. In the law suit, the acceptance rates of individual 
departments were used to argue that the first model was, in fact, the correct one. 
Purely theoretically, another option could have been to conduct a randomized ex- 
periment to eliminate selection bias: instead of allowing students to choose their 
department, we would assign it at random. In CP-logic terms, this corresponds to 
the intervention of removing the first two CP-laws from both theories and replacing 
them by: 



Vx (Apply (x, Engineering) : 0.5) V (Apply (x, Literature) : 0.5). 

If we were to perform this intervention, the first theory would predict a new 
acceptance rate of 0.45 for men and women alike, whereas the second would predict 
an acceptance rate of 0.35 for women and 0.55 for men. So, we see here that the kind 
of interventions induced by the semantics of CP-logic is able to explain Simpson's 
paradox, and does so in essentially the same way that Pearl does. 



8 CP-logic and logic programs 

There is an obvious similarity between the syntax of CP-logic and that of logic 
programs. Moreover, the constructive processes that we used to define the seman- 
tics of a CP-theory are also similar to the kind of fixpoint constructions used to 
define certain semantics for logic programs. In this section, we will investigate these 
similarities. To be more concrete, we will first define a straightforward probabilistic 
extension of logic programs, called Logic Programs with Annotated Disjunctions, 
and then prove that this is essentially equivalent to CP-logic. 

The connection between causal reasoning and logic programming has long been 
implicitly present; we can refer in this respect to, for instance, formalizations of situ- 
ation calculus in logic programming (jPinto and Reiter 1 993: Van Bellcgh em et al. 1997[ ). 
Here, we now make this relation explicit, by showing that the language of CP- 
logic, that we have constructed directly from causal principles, corresponds to ex- 
isting logic programming concepts. In this respect, our work is similar to that of 
(jMcCain and Turner 1996)) . who defined the language of causal theories, which was 
then shown to be closely related to logic programming. However, as we will discuss 
later, McCain and Turner formalise somewhat different causal intuitions, which 
leads to a correspondence to a different logic programming semantics. Our results 
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from this section will help to clarify the position of CP-logic among related work 
in the area of probabilistic logic programming, such as Poole's Independent Choice 
Logic (jPoole 1997|) . Moreover, they provide additional insight into the role that 
causality plays in such probabilistic logic programming languages, as well as in 
normal and disjunctive logic programs. 

8.1 Logic Programs with Annotated Disjunctions 

In this section, we define the language of Logic Programs with Annotated Disjunc- 
tions, or LPADs for short. This is a probabilistic extension of logic programming, 
which is based on disjunctive logic programs. This is a natural choice, because dis- 
junctions themselves — and therefore also disjunctive logic programs — already rep- 
resent a kind of uncertainty. Indeed, to give just one example, we could use these 
to model indeterminate effects of actions. Consider, for instance, the following dis- 
junctive rule: 

Heads V Tails <— Toss. 

This offers a quite intuitive representation of the fact that tossing a coin will result 
in either heads or tails. Of course, this is not all we know. Indeed, a coin also has 
equal probability of landing on heads or tails. The idea behind LPADs is now simply 
to express this by annotating each of the disjuncts in the head with a probability, 
i.e., we write: 

(Heads : 0.5) V (Tail : 0.5) «- Toss. 
Formally, an LPAD is a set of rules: 

(hi : ax) V • • • V (h n : a n ) «- cj>, (36) 

where the hi are ground atoms and <j> is a sentence. As such, LPADs are syntactically 
identical to CP-logic. However, we will define their semantics quite differently. For 
instance, the above example will express that precisely one of the following logic 
programming rules holds: either Heads <— Toss holds, i.e., if the coin is tossed this 
will yield heads, or the rule Tails <— Toss holds, i.e., tossing the coin gives tails. 
Each of these two rules has a probability of 0.5 of being the actual instantiation of 
the disjunctive rule. 

More generally, every rule of form (|36[) represents a probability distribution over 
the following set of logic programming rules: 

{(ht<-4>) | 1 <* <n}. 

From these distributions, a probability distribution over logic programs is then 
derived. To formally define this distribution, we introduce the following concept 
of a selection. We use the notation head*(r) to denote the set of pairs head(r) U 
1 — J2(h-a)ehead(r) a )}> where represents the possibility that none of the hi's 
are caused by the rule r. 

Definition 17 (C -selection) 

Let C be an LPAD. A C-selection is a function a from C to Urec head*(r), such 
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that for all r £ C, <r(r) € head*(r). By cr' l (r) and er Q (r) we denote, respectively, 
the first and second element of the pair a(r). The set of all C-selections is denoted 
as <Sc- 

The probability P(u) of a selection a is now defined as Yirec aa ( r ) ■ ^or a set 
S C 5c of selections, we define the probability P(5) as X^o-eS P( a )- By C a we 
denote the logic program that consists of all rules a h (r) <— body(r) for which r 6 C 
and cr ft '(r) ^ 0. Such a C" 7 is called an instance of C. We will interpret these 
instances by the well-founded model semantics. Recall that, in general, the well- 
founded model of a program P, wfm(P), is a pair (/, J) of interpretations, where 
/ contains all atoms that are certainly true and J contains all atoms that might 
possibly be true. If I = J, then the well-founded model is called exact. Intuitively, 
if wfm(P) is exact, then the truth of all atoms can be decided, i.e., everything that 
is not false can be derived. In the semantics of LPADs, we want to ensure that all 
uncertainty is expressed by means of the annotated disjunctions. In other words, 
given a specific selection, there should no longer be any uncertainty. We therefore 
impose the following criterion. 

Definition 18 {soundness) 

An LPAD C is sound iff all instances of C have an exact well-founded model. 

For such LPADs, the following semantics can now be defined. 
Definition 19 [instance based semantics /ic) 

Let C be a sound LPAD. For an interpretation J, we denote by W{I) the set of 
all C-selections a for which wfm(C a ) = {I, I). The instance based semantics \ic 
of C is the probability distribution on interpretations, that assigns to each I the 
probability P(W(I)) of this set of selections W(I). 

It is straightforward to extend this definition to allow for exogenous predicates 
as well. Indeed, in Section [2~2l we have already seen how to define the well-founded 
semantics for rule sets with open predicates, and this is basically all that is needed. 
Concretely, given an interpretation X for a set of exogenous predicates, we can 
define the instance based semantics [Xq given X as the distribution that assigns, to 
each interpretation / of the endogenous predicates, the probability of the set of all 
selections a for which (/,/) is the well-founded models of C a given X. Of course, 
this semantics is only defined for LPADs that are sound in X, meaning that the 
well-founded model of each C 7 given X is two-valued. 

8.2 Equivalence to CP-logic 

Every CP-theory is syntactically also an LPAD and vice versa. The key result of 
this section is that the instance based semantics \xc for LPADs coincides with the 
CP-logic semantics ttc defined in Sections [3] and [5] 

Theorem 9 

Let C be a CP-theory that is valid in X. Then C is also an LPAD that is sound in 
X and, moreover, ^jlq = ttq . 
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Proof 

Proof of this theorem is given in Section [Q □ 

We remark that it is not the case that every sound LPAD is also a valid CP- 
theory. In other words, there are some sound LPADs that cannot be seen as a 
sensible description of a set of probabilistic causal laws. 

Example 17 

It is easy to see that the following CP-theory has no execution models. 

(P : 0.5) V (Q : 0.5) <- R. 

R^^P. 
R^^Q. 



However, each of its instances has an exact well-founded model: for {P <— R;R <— 
-P; R <- ->Q} this is {R, P] and for {Q <- R; R <- -P; R <- -.Q} this is {R, Q}. 
Clearly, this CP-theory does not have execution models that satisfy the temporal 
precedence assumption. 



8.3 Discussion 

The results of this section relate CP-logic to LPADs and, more generally speaking, 
to the area of logic programming and its probabilistic extensions. As such, these 
results help to position CP-logic among related work, such as Poole's Independent 
Choice Logic and McCain and Turner's causal theories, which we will discuss in 
Section 19.21 Moreover, they also provide a valuable piece of knowledge representa- 
tion methodology for these languages, by clarifying how they can represent causal 
information. To illustrate, we now discuss the relevance of our theorem for some 
logic programming variants. 

Disjunctive logic programs. In probabilistic modeling, it is often useful to consider 
the qualitative structure of a theory separately from its probabilistic parameters. 
Indeed, for instance, in machine learning, the problems of structure learning and 
parameter learning are two very different tasks. If we consider only the structure 
of a CP-theory, then, syntactically speaking, we end up with a disjunctive logic 
program, i.e., a set of rules: 

hiV---Vh„*-(t>. (37) 
We can also single out the qualitative information contained in the semantics ire of 
such a CP-theory. Indeed, as we have already seen, like any probability distribution 
over interpretations, t:c induces a possible world semantics, consisting of those 
interpretations I for which ire (I) > 0. Thus we can define: 

I\=C\£ vr c (P) >0 

Now, let us restrict our attention to only those CP-theories in which, for every 
CP-law r, the sum of the probabilities on appearing in head(r) is precisely 1. This 
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is without loss of generality, since we can simply add an additional disjunct (P : 
1 — y\ on), with P some new atom, to all rules which do not satisfy this property. It 
is easy to see that the set of possible worlds is then independent of the precise values 
of the aj, i.e., the qualitative aspects of the semantics of such a theory depend only 
on the qualitative aspects of its syntactical form. Stated differently, for any pair of 
CP-theories C, C which differ only on the a^s, it holds that, for any interpretation 

i, i \= c iff 1 \= a. 

From the point of view of disjunctive logic programming, this set of possible 
worlds therefore offers an alternative semantics for such a program. Under this se- 
mantics, the intuitive reading of a rule of form (ITT)) is: "0 causes a non-deterministic 
event, whose effect is precisely one of the hi,..., h n ." This is a different infor- 
mal reading than in the standard stable model semantics for disjunctive programs 
( jPrzymusm ski 1991). Indeed, under our reading, a rule corresponds to a causal 
event, whereas, under the stable model reading, it is supposed to describe an as- 
pect of the reasoning behaviour of a rational agent. Consider, for instance, the 
disjunctive program {pVq. p.}. To us, this program describes a set of two non- 
deterministic events: One event causes either p or q and another event always causes 
p. Formally, this leads to two possible worlds, namely {p} and {p, q}. Under the 
stable model semantics, however, this program states that an agent believes either 
p or q and the agents believes p. In this case, he has no reason to believe q and the 
only stable model is {p}. So, clearly, the causal view on disjunctive logic program- 
ming induced by CP-logic is fundamentally different from the standard view and 
leads to a different formal semantics. Interestingly, the possible model semantics 
(jSakama and Inoue 1994ft for disjunctive programs is quite similar to the LPAD 
treatment, because it consists of the stable models of instances of a program. Be- 
cause, as shown in Section \8l2\ the semantics of CP-logic considers the well-founded 
models of instances, these two semantics are very closely related. Indeed, for a large 
class of programs, including all stratified ones, they coincide completely. 



Normal logic programs. Let us consider a logic program P, consisting of a set of 
rules h <— cj>, with h a ground atom and <fi a formula. Syntactically, such a program 
is also a deterministic CP-theory. Its semantics np assigns a probability of 1 to a 
single interpretation and to all other interpretations. Moreover, the results from 
Section 18.21 tell us that the interpretation with probability 1 will be precisely the 
well-founded model of P. As such, these results show that a logic program under 
the well-founded semantics can be viewed as a description of deterministic causal 
information. Concretely, we find that we can read a rule h <— <f> as: u <f> causes a 
deterministic event, whose effect is h." 

This observation makes explicit the connection between causal reasoning and 
logic programming that has long been implicitly present in this field, as is witnessed, 
e.g., by the work on situation calculus in logic programming. As such, it enhances 
the theoretical foundations behind the pragmatic use of logic programs to represent 
causal events. 
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FO(ID). FO(ID) (also called ID-logic) (|Denecker and Ternovska 2007|) extends clas- 
sical logic with inductive definitions. Similar to the way they appear in mathemat- 
ical texts, an inductive definition is represented as a set of definitional rules, which 
are of the form Vx p(t) <— </>, where x is a tuple of variables, is a first-order 
formula and p(t) an atom. Such a definition defines all predicates in the head of the 
rules by simultaneous induction in terms of the other predicates, which are called 
the open predicates of the definition. This syntax offers a uniform way of expressing 
the most important forms of inductive definitions found in mathematics, including 
monotone, transfinite and iterated inductive definitions, and inductive definitions 
over a well-founded order. Formally, the semantics of such a definition is given 
by the well-founded semantics, which has been shown to correctly formalise these 
forms of inductive definitions. To be more concrete, an interpretation / is a model 
of a definition D if it interprets the defined predicates as the well-founded model 
of D extending the restriction of I to the open symbols of D. 

Our results show that finite propositional definitions in FO(ID) are, both syntac- 
tically and semantically, identical to deterministic CP-theories. We can therefore 
view such a set of rules as both an inductive definition and a description of a causal 
process. This relation between induction and causality may be remarkable, but it 
is not all that surprising. In essence, an inductive definition defines a concept by 
describing how to construct it. As such, an inductive definition also specifies a con- 
struction process, and such processes are basically causal in nature. Or to put it 
another way, an inductive definition is nothing more than a description of a causal 
process, that takes place not in the real world, but in the domain of mathemati- 
cal objects. This suggests that the ability of mathematicians and formal scientists 
in general to understand inductive definitions is rooted deeply in human common 
sense, in particular our ability to understand and reason about causation in the 
physical world. 



9 Related work 

In this section, we discuss some research that is related to our work on CP-logic. 
Roughly speaking, we can divide this into two different categories, namely, the 
related work that focuses mainly on formalizing causality and that which focuses 
mainly on representing probabilistic knowledge. 



9.1 Pearl's causal models 

Our work on CP-logic studies causality from a knowledge representation perspec- 



tive. As such, it is closely related to the work of Pearl (2000 1 . His work uses Bayesian 
networks and structural models as formal tools. In Section [7J we have already com- 
pared CP-logic to Bayesian networks and showed that it offers certain representa- 
tional advantages. A structural model is a set of equations, each of which defines 
the value of one random variable in terms of the values of a set of other random 
variables. For each endogenous random variable, there is precisely one such defin- 
ing equation. As for Bayesian networks, we can say here that a CP-theory is more 
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detailed, since it represents individual causal laws, while a single structural model 
equation has to take into account all of the random variables that have a direct in- 
fluence on the value of the defined random variable. Another similarity to Bayesian 
networks is that structural models have to be acyclic as well, which means that, in 
this sense, they are also less general than CP-logic. Apart from this, a lot depends 
of course on the particular form that these equations take, so there is not much 
more that can be said in general about this. 

Pearl's work mainly focuses on the behaviour of causal models in the presence 
of interventions. As we have shown in Section [7.5[ it is possible to consider similar 
interventions in the context of CP-logic. Pearl uses interventions for a number of 
interesting purposes, such as handling counterfactuals. They have also been used 
to define concepts such as "actual causes" ( |Halpern and Pearl 2001a[ ) and "expla- 
nations" flHalpern and P earl 2001b|. The explicitly dynamic processes of CP-logic 
seem to offer an interesting setting in which to investigate these concepts as well. 
Indeed, in any particular branch of an execution model of a CP-theory, every true 
atom p is caused by at least one CP-law whose precondition <fi was satisfied at the 
time when this event happened. It now seems sensible to call <fr an actual cause of 
p. An interesting question is to what extent such a definition would coincide with 
the notion of actual causation defined by Halpern. However, we leave these issues 
for future work. 

9.2 Causality in logic programs 

In the area of logic programming, many languages can be found which express some 
kind of (non-probabilistic) causal laws. A typical example is McCain and Turner's 
causal theories (jMcCain and Turner 1996|) . A causal theory is a set of rules <j> <= tp, 
where <f> and ip are propositional formulas. The semantics of such a theory T is 
defined by a fixpoint criterion. Concretely, an interpretation / is a model of T if I 
is the unique classical model of the theory T 1 that consists of all 0, for which there 
is a rule <f> <^ ip in T such that / |= ip. 

In CP-logic, we assume that the domain is initially in a certain state, which then 
changes through a series of events. This naturally leads to the kind of constructive 
processes that we have used to define the formal semantics of CP-logic. By contrast, 
according to McCain and Turner's fixpoint condition, a proposition can have any 
truth value, as long as there exists some causal explanation for this truth value. 
This difference mainly manifests itself in two ways. 

First, in CP-logic, every endogenous property has an initial truth value, which can 
only change as the result of an event. As such, there is a fundamental asymmetry 
between falsity and truth, since only one of them represents the "natural" state 
of the property. For McCain & Turner, however, truth and falsity are completely 
symmetric and both need to be causally explained. As such, if the theory is to have 
any models, then, for every proposition Q, there must always be a cause for either 
Q or -iQ. 

A second difference is that the constructive processes of CP-logic rule out any 
unfounded causality, i.e., it cannot be the case that properties spontaneously cause 
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themselves. In McCain & Turner's theories, this "spontaneous generation" of prop- 
erties can occur. For instance, the CP-theory {Q <— Q} has {} as its (unique) 
model, whereas the causal theory {Q ■<= Q} has {Q} as its (unique) model. As 
such, the direct representation of cyclic causal relations that is possible in CP-logic 
(e.g., Example[2]) cannot be done in causal theories; instead, one has to use an en- 
coding similar to the one needed in Bayesian networks (e.g., Figure[l2]). In practice, 
the main advantage of McCain & Turner's treatment of causal cycles seems to be 
that it offers a way of introducing exogenous atoms into the language. Indeed, by 
including both Q •<= Q and ->Q <= ^Q, one can express that Q can have any truth 
value, without this requiring any further causal explanation. Of course, CP-logic 
has no need for such a mechanism, since we make an explicit distinction between 
exogenous and endogenous predicates. It is interesting to observe that, given the 
relation between logic programming and causal theories proven in ( jMcCain 1997[) . 
this difference actually corresponds to the difference between the well-founded and 
completion semantics for logic programs. 

9.3 Action languages 

In Knowledge Representation, a significant amount of work has been on the topic of 
action languages, such as A, B and C ( Ge lfond and Lifschitz 1998|) . These languages 
have in common with CP-logic that causality plays a significant role in their setting, 
and also that they are closely related to logic programming (Lifschitz and Turner 1999|) . 
Moreover, there also exist probabilistic extensions of these langauges (Baral ct al. 2002). 

Action languages conceive the world as consisting of a system of causal laws, 
together with an external agent who can decide to act upon this system. Crucially, 
the agent's decisions are themselves not governed by the system's causal laws. 
Therefore, if we consider, e.g., Pearl's framework, the closest thing to this concept 
of an action would be his notion of an intervention. Indeed, it has been shown in 
(jTran and Baral 2004)) that interventions can be encoded in the probabilistic action 
language PAL (Baral et al. 2002) precisely as actions. In CP-logic, the most natural 
way of encoding an action would be, for the same reason, by means of an exogenous 
predicate. While the behaviour of an agent in an action language is not determined 
by the state of the system, it is however typically restrained by it: certain actions 
are only available in certain states. Neither of these two ways of encoding actions 
(i.e., in Pearl's framework or in ours) would be able to directly represent such 
constraints. 

Action languages allow the effects of an action to be specified by means of so- 
called dynamic causal laws. In Pearl's framework, the effect of an action would be 
given as part of the specification of the intervention that is performed. In CP-logic, 
such knowledge would take the form of a CP-law whose body contains the exoge- 
nous predicate representing the action, and whose head contains some endogenous 
predicate that is affected by the action. 

Besides these dynamic causal laws, there are also static causal laws. These repre- 
sent the causal laws that are obeyed by the system itself; once we know the direct 
effects of the agent's actions, the static causal laws tell us how these propagate 
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through the rest of the system. In Pearl's framework, they would correspond to 
the functional equation that were not intervened with. In CP-logic, they would be 
CP-laws with endogenous predicates in body and head. 

As this brief discussion demonstrates, we could conceive of a probabilistic action 
language in which the (dynamic and static) causal laws are expressed in CP-logic, 
while the actions and constraints thereon are defined in some other language. To 
the best of our knowledge, however, all of the approaches in current literature use 
essentially a McCain and Turner-style representation for the causal laws (see e.g. 
dGiunchiglia and Lifschitz 1998D ). 

To illustrate, let us consider the transmission in a car. This is a system consisting 
of a number of gear wheels, which can be connected in various ways, such that 
turning one of the gear wheels causes connected gear wheels to turn also. This 
system can be represented by a set of causal laws — and can be done so better in 
CP-logic than in McCain and Turner's logic. The reason is that there exist cyclic 
causal relations between connected gear wheels: if the engine is turning, this can 
cause the car to move, but vice versa, if the car is moving, this can also cause the 
engine to turn ("engine braking"). As explained in Section W% such cyclic causality 
is handled better in CP-logic. Note also that in the context of probabilistic planning, 
the McCain and Turner style representation poses the problem that loops in the 
causal laws (e.g., p <= q and q =>■ p) lead to uncertainty (e.g., p and q can either both 
be true or both be false) that is not probabilistically quantified. PAL, for instance, 
solves this problem by assuming that, in such a case, all possible states are equally 
likely. 

A CP-logic representation of the transmission would probably use predicates such 
as Clutch and ShiftGear to refer to the actions available to the driver. These would 
be exogenous predicates, and we would not say anything more about them. An 
action language, however, could do more. It would also allow to express constraints 
on these actions, such as that shifting gears is only allowed while the clutch is in 
operation. In such a setting, we would have enough information to help the driver 
come up with a plan to, e.g., put the car into fourth gear without stalling, which 
cannot be done using just CP-logic by itself. 



9-4 Probabilistic languages 

Part of the motivation behind CP-logic was to provide a probabilistic logic pro- 
gramming language in which statements have an intuitive meaning that can easily 
be explained to domain experts, without having to rely on any prior knowledge of 
logic programming. To this end, we have developed, from scratch, a formalization of 
the concept of a probabilistic causal law; the resulting language was then shown to 
be equivalent to the probabilistic logic programming construction of LPADs. This 
result allows us to interpret probabilistic logic programs in a new way, namely as 
sets of probabilistic causal laws. This does not only hold for LPADs themselves, 
but also for related languages, which we will now discuss in more detail. 
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9-4-1 Independent Choice Logic 

Independent Choice Logic (ICL) (jPoole 1997[) by Poole is a probabilistic extension 
of abductive logic programming, that extends the earlier formalism of Probabilistic 
Horn Abduction (jPoole 1993[) . An ICL theory consists of both a logical and a prob- 
abilistic part. The logical part is an acyclic logic program. The probabilistic part 
consists of a set of rules of the form (in CP-logic syntax) : 

(hi : oti) V ••• V (h n : a n ) 

such that Y^i=i = 1- The atoms hi in such clauses are called abducibles. Each 
abducible may only appear once in the probabilistic part of an ICL program; in the 
logical part of the program, abducibles may only appear in the bodies of clauses. 

Syntactically speaking, each ICL theory is also CP-theory. Moreover, the ICL 
semantics of such a theory (as formulated in, e.g., (|Poole 1997P ) can easily be seen 
to coincide with our instance based semantics for LPADs. As such, an ICL the- 
ory can be seen as a CP-theory in which every CP-law is either deterministic or 
unconditional. 

We can also translate certain LPADs to ICL in a straightforward way. Concretely, 
this can be done for acyclic LPADs without exogenous predicates, for which the 
bodies of all CP-laws are conjunctions of literals. Such a CP-law r of the form: 

(h% : oti) V ••• V (h n : a n ) <— cf> 

is then transformed into the set of rules: 

hi <— (j) A Choice r (\). 

< > 

h n <— <j) A Choice r (n). 

(Choice r (l) : oti) V • • • V (Choice r (n) : a n ). 

The idea behind this transformation is that every selection of the original theory 
C corresponds to precisely one selection of the translation C . More precisely, if 
we denote by ChoiceRule(r) the last CP-law in the above translation of a rule r, 
then a C-selection a corresponds to the C"-selection a', for which for all r G C, 
a(r) = (hi : ai) iff a' (ChoiceRule(r)) = (Choice r (i) : oii). It is quite obvious 
that this one-to-one correspondence preserves both the probabilities of selections 
and the (restrictions to the original vocabulary of the) well-founded models of the 
instances of selections. This suffices to show that the probability distribution defined 
by C coincides with the (restriction to the original vocabulary of) the probability 
distribution defined by C . 

So, our result on the equivalence between LPADs and CP-logic shows that the 
two parts of an ICL theory can be understood as, respectively, a set of unconditional 
probabilistic events and a set of deterministic causal events. In this sense, our work 
offers a causal interpretation for ICL. It is, in this respect, somewhat related to 
the work of Finzi et al. on causality in ICL. In (Finzi and Lukasiewicz 2003), these 
authors present a mapping of ICL into Pearl's structural models and use this to 
derive a concept of actual causation for this logic, based on the work by Halpern 
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(Halpern and Pearl 2001a). This approach is, however, somewhat opposite to ours. 
Indeed, we view a CP-theory, with its structure based on individual probabilistic 
causal laws, as a more fine-grained model of causality. Transforming a CP-theory 
into a structural model actually loses information, in the sense that it is not possible 
to recover the original structure of the theory. From the point-of-view of CP-logic, 
the approach of Finzi et al. would therefore not make much sense, since it would 
attempt to define the concept of actual causation in a more fine-grained model of 
causal information by means of a transition to a coarser one. 



9.4.2 P-log 

P-log (Baral ct al. 2004} IBaral et al. 2008)) is an extension of the language of An- 
swer Set Prolog with new constructs for representing probabilistic information. It 
is a sorted logic, which allows for the definition of attributes, which map tuples 
(of particular sorts) into a value (of a particular sort). Two kinds of probabilistic 
statements are considered. The first are called random selection rules and are of 
the form: 

[r] random(A(t) : {x : P(x)}) <— (p. 

Here, r is a name for the rule, P is an unary boolean attribute, A is an attribute 
with t a vector of arguments of appropriate sorts, and is a collection of so- 
called extended literalalfl- The meaning of a statement of the above form is that, 
if the body (j> of the rule is satisfied, the attribute A(t) is selected at random 
from the intersection of its domain with the set of all terms x for which P(x) holds 
(unless some deliberate action intervenes). The label r is a name for the experiment 
that performs this random selection. The choice of which value will be assigned to 
this attribute is random and, by default, all possible values are considered equally 
likely. It is, however, possible to override such a default, using the second kind of 
statements, called probabilistic atoms. These are of the form: 

pr r (A(t) = y | c 4>) = a. 

Such a statement should be read as: if the value of A(t) is determined by the 
experiment r, and if also <f> holds, then the probability of A(t) — y is a. 

The information expressed by a random selection rule and its associated proba- 
bilistic atoms is somewhat similar to a CP-law, but stays closer to a Bayesian net- 
work style representation. Indeed, it expresses that, under certain conditions, the 
value of a certain attribute will be determined by some implicit random process, 
which produces each of a number of possible outcomes with a certain probability. 
We see that, as in Bayesian networks, there is no way of directly representing in- 
formation about the actual events that might take place; instead, only information 
about the way in which they eventually affect the value of some attribute (or ran- 
dom variable, in Bayesian network terminology) can be incorporated. Therefore, 

8 An extended literal is either a classical literal or a classical literal preceded by the default 
negation not, where a classical literal is either an atom A(t) = to or the classical negation 
^A(t) = to thereof. 
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representing the kind of phenomena discussed in Section [7] — namely, cyclic causal 
relations and effects with a number of independent possible causes — requires the 
same kind of encoding in P-log as in Bayesian networks. 

A second interesting difference is that a random-statement of P-log represents an 
experiment in which a value is selected from a dynamic set of alternatives, whereas, 
in CP-logic the set of possible outcomes is specified statically. Consider, for instance, 
a robot that leaves a room by selecting at random one of the doors that happens 
to be open. In P-log, this can easily be written down as: 

[r] random(Leave-through : {x : O pen -door (x)}). 

In CP-logic, such a concise representation is currently not possible. 

Apart from probabilistic statements, a P-log program can also contain a set of 
regular Answer Set Prolog rules and a set of observations and interventions. The 
difference between observations and interventions is the same as highlighted by 
Pearl, and (Baral and Hunsakcr 2007) shows that interventions in P-log can be used 
to perform the same kind of counterfactual reasoning as Pearl does. One interesting 
difference, however, is that in P-log interventions are actually represented within 
the theory, whereas Pearl's approach (as well as the one we presented in Section 
I7.5j) views interventions as meta-manipulations of theories. 

In summary, the scope of P-log is significantly broader than that of CP-logic and 
it is a more full-blown knowledge representation language than CP-logic, which 
is only aimed at expressing a specific kind of probabilistic causal laws. However, 
when it comes to representing just this kind of knowledge, CP-logic offers the same 
advantages over P-log that it does over Bayesian networks. 



9.4-3 First-order Versions of Bayesian networks 

In this section, we discuss two approaches that aim at lifting the propositional for- 
malism of Bayesian networks to a first-order representation, namely Bayesian Logic 
Programs (BLPs) ( |Kersting and De Racdt 2000) and Relational Bayesian Networks 



(RBNs) d Jaeger 1997] ) . 

A Bayesian Logic Program or BLP consists of a set of definite clauses, using the 
symbol "|" instead of "<— ", i.e., clauses of the form 

P(t ) | Bl(ti),...,B n (t„). 

in which P and the B^s are predicate symbols and the tj's are tuples of terms. 
For every predicate symbol P, there is a domain dom(P) of possible values. The 
meaning of such a program is given by a Bayesian network, whose nodes consist 
of all the atoms in the least Herbrand model of the program. The domain of a 
node for a ground atom P(t) is dom(P). For every ground instantiation P{to) \ 
Si(ti), . . . , B n (t n ) of a clause in the program, the network contains an edge from 
each -Bi(ti) to -P(to), and these are the only edges that exist. 

To complete the definition of this Bayesian network, all the relevant conditional 
probabilities also need to be defined. To this end, the user needs to specify, for each 
clause in the program, a conditional probability table, which defines the conditional 
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probability of every value in dom(P), given an assignment of values to the atoms 
in the body of the clause. Now, let us first assume that every ground atom in 
the Bayesian network is an instantiation of the head of precisely one clause in the 
program. In this case, the tables for the clauses suffice to determine the conditional 
probability tables of the network, because every node can then simply take its 
probability table from this unique clause. However, in general, there might be many 
such clauses. To also handle this case, the user needs to specify, for each predicate 
symbol P, a so-called combination rule, which is a function that produces a single 
probability from a multiset of probabilities. The conditional probability table for 
a ground atom P(t) can then be constructed from the set of all clauses r, such 
that P{t) is an instantiation of head(r), by finding the appropriate entries in the 
tables for all such clauses r and then applying the combination rule for P to the 
multiset of these values. According to the semantics of Bayesian Logic Programs, 
this combination rule will always be applied, even when there exists only a single 
such r. 

This completes the definition of BLPs as given in, e.g., (Kerstin g and De Raedt 20 00). 
More recently, a number of issues with this formalism have led to the develop- 
ment of Logical Bayesian Networks (IFiere ns et al. 2005| . These issues have also 
prompted the addition of so-called "logical atoms" to the original BLP language 
flKersting and DeTl acdt 2007). Since this does not significantly affect any of the 
comparisons made in this section, however, we will ignore this extension. 

A Relational Bayesian Network ( |Jaeger 1997[ ) is a Bayesian network in which the 
nodes correspond to predicate symbols and the domain of a node for a predicate 
P/n consists of all possible interpretations of this predicate symbol in some fixed 
domain D, i.e., all subsets of D n . The conditional probability distribution associated 
to such a node P is specified by a probability formula F p . For every tuple d £ D n , 
F p (d) defines the probability of d belonging to the interpretation of P in terms of 
probabilities of tuples d' belonging to the interpretation of a predicate P', where P' 
is either a parent of P in the graph or even, under certain conditions, P itself. Such 
a probability formula can contain a number of different operations on probabilities, 
including the application of arbitrary combination rules. Such a Relational Bayesian 
Network can also be compiled into a network that is similar to that generated by a 
BLP, i.e., one in which the nodes correspond to domain atoms instead of predicate 
symbols. The main advantage of such a compiled network is that it allows more 
efficient inference. 

Again, the main difference between these two formalisms and CP-logic is that 
they both stick to the Bayesian network style of modeling, in the sense that the 
actual events that determine the values of the random variables are entirely ab- 
stracted away and only the resulting conditional probabilities are retained. How- 
ever, through the use of, respectively, combination rules and probability formulas, 
these can be represented in a more structured manner than in a simple table. In 
this way, knowledge about, the underlying causal events can be exploited to repre- 
sent the conditional probability distributions in a concise way. The most common 
example is probably the use of the noisy- or to handle an effect which has a number 
of independent possible causes. For instance, let us consider the Russian roulette 
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problem of Example [15] In a BLP, the relation between the guns firing and the 
player's death could be represented by the following clause: 

Death | Fire(X). 





Fire(x) = t 


Fire(x) = f 


Death = t 


1/6 





Death = f 


5/6 


1 



Combination rule for Death : noisy-or 



In Relational Bayesian Networks, this would be represented as follows: 
FDeath = noisy-or({l/6 ■ Fire(x) \ x}) 




As such, combination rules do allow some knowledge about the events underlying 
the conditional probabilities to be incorporated into the model. However, this is of 
course not the same as actually having a structured representation of the events 
themselves, as is offered by CP-logic. As a consequence of this, cyclic causal rela- 
tions, such as that of our Pneumonia- Angina example, still need the same kind of 
encoding as in a Bayesian network. 



9.4-4 Other approaches 

In this section, we give a quick overview of some other related languages. An im- 
portant class of probabilistic logic programming formalisms are those following 
the Knowledge Based Model Construction approach. Such formalisms allow the 
representation of an entire "class" of propositional models, from which, for a spe- 
cific query, an appropriate model can then be constructed "at run-time" . This 
approach was initiated by Breese (jBreese 1992[) and Bacchus (jBacchus 1993ft and 
is followed by both Bayesian Logic Programs and Relational Bayesian Networks. 
Other formalism in this class are Probabilistic Knowledge Bases of Ngo and Had- 
dawy ( |Ngo and Hadda wy 1997ft and Probabilistic Relational Models of Getoor et al. 
([Getoor et al. 2001]) . From the point of view of comparison to CP-logic, both are 
very similar to Bayesian Logic Programs (see, e.g., ( |Kersting and De Ra cdt 2001) 
for a comparison). 

The language used in the Programming in Statistical Modeling system (PRISM) 
dSato and Kamcya 1997) is very similar to Independent Choice Logic. Our com- 
ments concerning the relation between CP-logic and Independent Choice Logic 
therefore carry over to PRISM. 

Like CP-logic, Many-Valued Disjunctive Logic Programs (Lukasiewicz 2001) are 
also related to disjunctive logic programming. However, in this language, proba- 
bilities are associated with disjunctive clauses as a whole. In this way, uncertainty 
of the implication itself — and not, as is the case with LPADs or CP-logic, of the 
disjuncts in the head — is expressed. 
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All the works mentioned so far use point probabilities. There are however also a 
number of formalisms using probability intervals: Probabilistic Logic Programs of Ng 
and Subrahmanian ( |Ng and Subrahmanian 1992[ ), their extension to Hybrid Proba- 
bilistic Programs of Dekhtyar and Subrahmanian ( Dekhtyar and Subrahmanian 2000 ) 
and Probabilistic Deductive Databases of Lakshmanan and Sadri (Lakshmanan and Sadri 1994). 
Contrary to our approach, programs in these formalisms do not define a single prob- 
ability distribution, but rather a set of possible probability distributions, which al- 
lows one to express a kind of "meta-uncertainty" , i.e., uncertainty about which prob- 
ability distribution is the "right" one. Moreover, the techniques used by these for- 
malisms tend to have more in common with constraint logic programming than stan- 
dard logic programming. The more recent formalism of CLP(BN) ( Santo s~Costa et al. 2003)) 
belongs to this class. 

We also want to mention Stochastic Logic Programs of Muggleton and Cussens 
(jCussens 2 000 Muggleton 2000), which is a probabilistic extension of Prolog. In this 
formalism, probabilities are attached to the selection of clauses in Prolog's SLD- 
resolution algorithm, which basically results in a first-order version of stochastic 
context free grammars. Because of this formalism's strong ties to the procedural 
aspects of Prolog, it appears to be quite different from CP-logic and indeed all of 
the other formalisms mentioned here. 

ProbLog (|De Raedt et al. 2007P is a more recent probabilistic extension of pure 
Prolog. Here, too, every clause is labeled with a probability. The semantics of 
ProbLog is very similar to that of LPADs and, in fact, the semantics of a ground 
ProbLog program coincides completely with that of the corresponding LPAD. More 
precisely put, a ProbLog rule of the form: 

a: h *- bi, . . . ,b n , 

where h and the bi are ground atoms is entirely equivalent to the LPAD rule: 

(h : a) <— bi,. . . ,b n . 

For non-ground programs, however, there is a difference. The semantics of an LPAD 
first grounds the entire program and then probabilistically selects instantiations of 
the rules of this ground program. In ProbLog, on the other hand, selections directly 
pick out rules of the original program. This means that, for instance, the following 
ProbLog-rule: 

0.8 : likes{X, Y) <- likes(X, Z),likes(Z, Y), 

specifies that, with probability 0.8, the Zi/ces-relation is entirely transitive, whereas 
the corresponding LPAD-rule would mean that for all individuals a, b and c, the 
fact that a likes b and b likes c causes a to like c with probability 0.8. 

10 Conclusions and future work 

Causality has an inherent dynamic aspect, which can be captured at the semantical 
level by the probability tree framework that Shafer has developed in (Shafer 1996). 
He concludes this book with the following observation: 
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When we think of a Bayes net as a representation of a probability tree, we sometimes 
may also want to leave indeterminate orderings that are not imposed by arrows in the 
graph, so that the net can be thought of as a representation not of a single tree but of 
a class of trees, corresponding to different choices for these orderings. The possibility of 
introducing indeterminacy in the ordering of judgements is obviously equally present in 
the type-theoretical representation. [...] [W]e can think of [a large collection of partially 
ordered judgements] as a set of rules from which martingale trees can be constructed. [...] 
More abstractly, they can be thought of as causal laws, and we can imagine many problems 
of deliberation being posed and solved directly in terms of these causal laws, without the 
specification of a martingale tree. Thus type theory can take us beyond probability trees 
to a more general framework for causal deliberation. 

This paper can be seen as an extension of Shafer's work in the direction pointed at 
by the above remarks. We have developed a logical language which uses probabilistic 
causal laws to concisely represent classes of probability trees. Our representation 
does not explicitly impose any order on the possible events, since we move away 
from a representation in which it is the outcome of previous events that causes a 
new event, to one in which new events are caused by properties of the current state 
of the domain. Even though this means that the probability trees corresponding to a 
given set of causal laws might be considerably different from one another, we prove 
that they will all still generate the same probability distribution over their final 
states. Therefore, such causal laws capture precisely those properties of probability 
trees that we need to answer questions about the probabilities in these final states, 
which is what we are typically interested in. 

A first contribution of this paper is the language CP-logic itself. This language 
allows to represent a probability distribution over possible states of a domain by 
an enumeration of the probabilistic causal laws according to which it is generated. 
As we tried to show by, among others, the comparison to Bayesian networks, this 
representation is often natural and concise. A second contribution is that we have 
shown that CP-logic can be equivalently defined as a probabilistic logic program- 
ming language. Because both the meaning of statements in CP-logic, as well as 
their formal semantics, can be completely explained in terms of intuitions about 
probabilistic causal laws, this formal equivalence offers a new way of (informally) 
explaining the meaning of probabilistic logic programs. This is a useful contribution 
to the existing modeling methodology for such languages. 

By relating causality and logic programming in this way, our paper also serves as 
a unifying semantic study of existing probabilistic and non-probabilistics logics and 
formalisms. We showed how CP-logic refines causal Bayesian networks and several 
logics based on them. We also elaborated on the links between CP-logic and exist- 
ing logic programming extensions such as ICL, PPJSM and LPADs, thus showing 
that these logics can also be viewed as causal probabilistic logics. For example, a 
theory in fCL can be understood as a combination of deterministic causal events 
and unconditional probabilistic events. As for logic programming itself, we showed 
that CP-logic induces a causal view on this formalism, in which rules represent de- 
terministic causal events. We also argued that this view basically coincides with the 
view of logic programs as inductive definitions. To be more concrete, we have shown 
that a normal logic program under the well-founded semantics can be understood 
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as a set of deterministic causal statements and we have presented an alternative se- 
mantics for disjunctive logic programs (similar to that of jSakama a nd Inoue 1 994)) 
under which these can be interpreted as sets of non-deterministic causal events. 

This paper is primarily intended to show how the concept of a probabilistic 
causal law can be formalized in a logical language, and to demonstrate the close 
relation of such a language to probabilistic logic programs. Because of this, we have 
intentionally kept our language quite simple. As became apparent in the comparison 
with other logics, such as P-log, CP-logic therefore lacks the expressivity to be 
truly useful for a broad class of applications. To make it more suitable for practical 
purposes, it should therefore be improved in a number of ways. We see the following 
opportunities for future research. 

Refinement of CP-logic. The current language of CP-logic is restricted in a number 
of ways. First, it only allows a finite number of CP-laws. Let us consider, for in- 
stance, a die that is rolled as long as it takes to obtain a six. Here, there is no upper 
bound on the number of throws that might be needed and, therefore, this example 
can currently not be represented in CP-logic. Second, CP-logic is also limited in its 
representation of the effects of an event. For instance, it is not possible to directly 
represent events whose range of possible outcomes is not completely fixed before- 
hand. Also, we currently do not allow different events to cancel out or reinforce 
each other's effects. Third, and somewhat related to the previous point, CP-logic 
currently can only handle properties that are either fully present or fully absent. 
As such, it cannot correctly represent causes which have only a contributory effect, 
e.g., turning on a tap would not instantaneously cause a basin to be full, but only 
contribute a certain amount per time unit. 

Integration into a larger formalism. To correctly formalise a domain in CP-logic, 
a user must exactly know the causes and effects of all relevant events that might 
happen. For real domains of any significant size, this is an unrealistic assumption. 
Indeed, typically, one will only have such detailed knowledge about certain parts 
of a domain. So, in order to still be able to use CP-logic in such a setting, it would 
have to be integrated with other forms of knowledge. There are some obvious candi- 
dates for this: statements about the probabilities of certain properties, statements 
about probabilistic independencies (such as those in Bayesian networks), and con- 
straints on the possible states of the domain. Integrating these different forms of 
knowledge without losing conceptual clarity is one of the main challenges for future 
work regarding CP-logic, and perhaps even for the area of uncertainty in artificial 
intelligence as a whole. 

Inference. The most obvious inference task in the context of CP-logic is calculating 
the probability ttc (0) of a formula <f>. A straightforward way of doing this would be 
to exploit the relation between CP-logic and (probabilistic) logic programming, such 
that we perform these computations by reusing existing algorithms (e.g., the infer- 
ence algorithm of Poole's independent choice logic (jPoole 19 97)) in an appropriate 
way. A more advanced technique, using binary decision diagrams, has recently been 
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developed in ( |Riguzzi"2 007). Another interesting inference task concerns the con- 
struction of a theory in CP-logic. For probabilistic modeling languages in general, 
it is typically not desirable that a user is forced to estimate or compute concrete 
probability values herself; instead, it should be possible to automatically derive 
these from a given data set. For CP-logic, there already exist algorithms that are 
able to do this in certain restricted cases ( jRiguzzi "20041 IBlockeel and Meert 2007[) . 
It would be interesting to generalize these, in order to make them generally appli- 
cable. Besides such learning of probabilistic parameters, it is also possible to learn 
the structure of the theory itself. This too is an important topic, because if we are 
able to construct the theory that best describes a given data set, we are in effect 
finding out which causal mechanisms are most likely present in this data. Such 
information can be relevant for many domains. For instance, when bio-informatics 
attempts to distinguish active from non-active compounds, this is exactly the kind 
of information that is needed. In (jMeert et al. 2007)) . it is discussed how certain 
Bayesian network learning techniques can be adapted to perform structure learning 
for ground CP-logic. 
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Appendix A Proofs of the theorems 

In this section, we present proofs of the theorems that were stated in the previous 
section. To ease notation, we will assume that there are no exogenous predicates. 
This can be done without loss of generality, since all our results can simply be 
relativized with respect to some fixed interpretation for these predicates. 

A.l The semantics is well-defined 

We start by proving that the semantics of CP-logic — and in particular, the partial 
interpretation u g , the potential in s, used in the additional condition imposed by 
Definition II II for handling negation — is indeed well-defined. Since we defined v s as 
the unique limit of all terminal hypothetical derivation sequences of s, this requires 
us to show that all such sequences indeed end up in the same limit (Theorem [4} . 

Let us consider a CP-theory C and state s in an execution model of C. We will 
denote by TZ(s) the set of all CP-laws r E C that have not yet happened in s, i.e., 
for which there is no ancestor s' of s with S(s') = r. Consider the collection O s 
of all partial interpretions v such that for each atom p, p v — t iff p 1 ^ = t, and 
for each rule r £ TZ(s), if body(r) L ' ^ f, then for each atom p £ headAt(r), p v 7^ f- 
Stated differently, v can be obtained from T(s) by turning false atoms of Z(s) into 
unknown atoms in such a way that if the body of some rule r £ 7Z-(s) is unknown 
or true in then each of its head atoms is unknown or true in v as well. 
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Proposition 1 

Let {vi)o<i<n be a hypothetical derivation sequence in state s. 

• For each < i < n and each v £ O s it holds that v < p vi. 

• The limit v n = v s is an element of O s . 

Proof 

The first property can be proven by a straigthforward induction. Clearly, it holds 
that v < p vq — T(s). Assume v < p Vi for some i < n. The true atoms of v and z^+i 
are those of T(s) , so they are the same. Therefore, it suffices to show that every 
atom p that is false in v is also false in fj+i, or, since v and Vi+\ have the same true 
atoms, that every suchp is not unknown in i>i+\. Assume towards contradiction that 
p is false in v and unknown in fj+i- By the induction hypothesis, p is still false in 
Vi. Therefore, p belongs to the head of some rule r £ 7?.(s) such that body(r) Ui ^ f . 
Since v < p Vi, this would imply that body{r) v ^ f, which, given that v 6 O s , leads 
to the contradiction that p v ^ f . Hence, p is false in Vi+\. It follows that v < p z^+i. 

As for the second property, it is clear that v s can be obtained from T(s) by turning 
some false atoms into unknown atoms, and that there are no more rules r 6 7t(s) 
with a non-false body and false atoms in the head w.r.t. v s . Hence, v s € O s . □ 

We can now use this set O s to characterize the limit v n of any hypothetical 
derivation sequence (^)o<i< n in s. 

Theorem 10 

Let {vi)o<i< n be a hypothetical derivation sequence in s and let v be the least 
upperbound of O s w.r.t. the precision order < p . Then v n = v. 

Proof 

It is obvious that v itself also belongs to O s . Therefore, by the first bullet of Propo- 
sition [TJ v < p v n . Because, by the second bullet of Proposition [TJ v n also belongs 
to O s , we have that v > p v n as well. □ 

Since this theorem shows that all hypothetical derivation sequences converge to 
the most precise element of O s , it implies Theorem @] and, therefore, our semantics 
is indeed well defined. 

A. 2 CP-logic and LPADs are equivalent 

Let C be an LPAD. Let us define a partial C -selection as a partial function a 
from C mapping rules r of a subset dom(a) C C to pairs (p : a) € head*(r). The 
probability function of selections can be extended to partial selections by setting 
P{ a ) = Ilr'e(iom(cr) fjQ! ( 7 ')- Define also S(a) as the set of C-selections that extend a. 
The following equation is obvious: 

P(a)= J2 P ^') 

ff'GS(o-) 
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We define an instance of a as any instance C in which a' is a C-selection that 
extends a. 

Let T be an execution model of C . Clearly, each node s in T determines a unique 
partial C-selection, denoted <j(s). Formally, if (sj)o<i<n is the path from the root 
to s, then the domain of o~{s) is {£ (sj) | < i < n} and each rule r = £(si) in its 
domain is mapped to the atom p £ head*(r) that was selected for Sj+i. Moreover, 
we have 

V{s) = P{a{s))= Y, *V)' ( Al ) 

cr'6S(cr(s)) 

With the path (si)o<i< n from the root to some node s, we now also associate a 
sequence of partial interpretations (Kj) 2 ™^ 1 defined as follows: 

• Kq = _L, the partial interpretation mapping all atoms to u. 

• iGi+i = v Si , for all < i < n. 

• K 2 i+2 — v Si [p : t] , for all < i < n, where p is the head atom of £{si) selected 
to obtain s-j+i. 

Proposition 2 

For each a £ cr(s), (Kj) 21 !^ 1 is a well-founded induction of C a . 
Proof 

The proof is by induction on the length n of the path from the root of T to s. 

We start by proving that (Kj) 2l l Q is a well-founded induction of all instances C a 
with a £ S(o-(s)). If n = 0, then s is the root of the tree and a(s) is the empty 
partial selection. The sequence (Kq) is obviously a well-founded induction of any 
instance C a . For n > 0, the induction hypothesis states that (-Kj)|=o is a well- 
founded induction of all instances C 7 , where u belongs to S(a(s n -i)). Let r be 
£(sn-i), the rule selected in s n _i, and let i^2n = -^2n-i[p : t] where p was selected 
in the head of r to obtain s. Hence, body(r) is true in i^n-i = ^s„_i- Clearly, for 
each ct £ ^(^(s)), C ' contains the rule p <— body(r). Consequently, (Kj) 2l l Q is a 
well-founded induction of C a . 

Next, we prove that (i^j)i=o * s a well-founded induction of all C" 7 with a £ 
^(^(s)). Let us investigate the set {J of all atoms g such that K2 n (q) ^ ^2n+i(?)- 
We will prove that all atoms of U are unknown in Km and false in K^n+x and that 
U is an unfounded set of C° ' . It then will follow that (Kj) -"q is a well-founded 
induction of C" 7 . 

Let us first verify that all atoms in U are unknown in Km and false in Km+\- 
If n = 0, then Kq = vq = K\, so U = {} and the statement trivially holds. Let 
n > 0. Recall that i^n is v Sn l [p : t], where p is the atom selected in the head of 
£(s n -i) to obtain s, and i^2n+ 1 = It is easy to see that the true atoms of Km 
and i^2n+i are identical to those true in T(s). Hence, Km and i^n+i only differ 
on false or unknown atoms. To show that U contains only atoms that are unknown 
in K2n and false in K2 n +i, it therefore suffices to show that all atoms false in Km 
are also false in Km+x- To prove this, it suffices to show that Km £ O s . Indeed, if 
K2n £ O s , Proposition [T] entails that v s > p Km and hence, all atoms false in i^n 
are false in v s = Km+\- 
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We observe that, since v Sn l belongs to Sn _ 1 (Proposition [J), all head atoms of 
rules r 6 7?.(s n _i) with a non-false body in u Sn _ t , are true or unknown in u Sn _ 1 . 
In particular, £(s n -i) S T^(s n — l) an d has a true body in v Sn _ x , hence p is true or 
unknown in f s „_ 1 . It follows that: 

It follows that any rule r £ 7?.(s) C 7£(s n _i) with a non-false body in Km, has 
a non-false body in v s „_ 1 ; hence, all atoms in the head of such an r are true or 
unknown in v Sn l and, a fortiori, in Ki n = v Snl \p : t]. Thus, we obtain that 
K2n £ O s , as desired. 

So far, we have proven that K^n+i = K2 n [U : f] and that all elements in U 
are unknown in Km- It follows that K^n < p Km^i and, more generally, that 
Kj < p K2n+i, for all j < 2n. All that remains to be shown is that U is an unfounded 
set of each instance of cr(s). Let C be such an instance and for any atom q G U, 
let q <— if be a rule of C". We need to show that if is false in it^n+l- The rule 
is obtained as an instance of some rule r 6 C with g in its head. The rule r is 
not one of the rules £(si) with i < n, since otherwise q would be true in T(sj) for 
all j > i and, in particular, also in v Sn — K^n+ii which would contradict the fact 
that we have already shown q to be false in i^n+i- It follows that r G IZ(s). Since 
K2n+i = v s £ O s and q is false in v s , body(r) = if is false in K^n+i- 
□ 

Proposition 3 

For each leaf I of an execution model T of C, I(Z) is the well-founded model of each 
instance C a with a G S(a(l)). 

Proof 

Let / be a leaf and a G S(a(l)). By Proposition [2 (i^j)^"^ 1 is a well-founded 
induction of C" 7 . Because lis a leaf, we have that for every rule r G TZ(l), body(r) is 
false ml (I). Therefore, 1(1) G Oi and Proposition[T]states that v% > p 1(1). However, 
because 1(1) is two- valued, this implies that vi = Ii- Therefore, K2 n +i = vi is a 
total interpretation. Because a well-founded induction with a total limit is terminal, 
1(1) is the well-founded model of C a . □ 

This now allows us to prove the desired equivalence, which was previously stated 
as Theorem HI 

Theorem 11 

Let T be an execution model of a CP-theory C. For each interpretation J, 

Pc(J) = k t (J). 

Proof 

Given an execution model T of a CP-theory C (Def. [TT|l . we associate to each node 
s of T the set S(a(s)) of all those C-selections cr (Def. [T7]) that extend cr(s). It 
is easy to see that, with Lq- the set of all leaves of T, the class {S(a(l))\l G Lt} 
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is a partition of the set Sc of all selections. Let Lq-(J) be the set of all leaves 
I of T for which 1(1) — J, and let Sels(J) be the set of selections a such that 
W F M (C a ) — J. Because for each leaf I, the well-founded model of a selection 
a £ S(a(l)) for is 1(1) (Proposition [3]) , the class {S(a(l))\l £ Lr(J)} is a partition 
of the collection Sels(J) . This now allows us to derive the following equation: 

»c(J) = E^seis(j) PV) = ZleLr(J) E^CO) P ^ ( Def - ® 

= EiGLr(J) ( see equation flU)) 

= 7Tr(J). 

□ 

For any execution model T of C, this theorem now characterizes the probability 
distribution nj- in a way that depends only on C and not on T itself. It follows 
that, indeed, for all execution models T and T of C, 7Tt- = 777-', which means that 
we have now also proven Theorem [6] (and, therefore, Theorem [2] as well). 

A. 3 Execution models that follow the timing 

In this section, we will prove Theorem which states that every stratified CP- 
theory has an execution model which follows its stratification. Recall that a CP- 
theory is stratified if it strictly respects some timing A (i.e., for all h € head,At(r) 
and b £ body\ t (r), X(h) > X(b) and for all h £ headAt(r) and b € body~^ t (r), 
X(h) > A(6)). As we did in Definition [9] of Scction[4] we will again introduce a event- 
timing At of A (i.e., k maps rules to time points in such a way that X(h) > n(r) > X(b) 
for all h £ headAt(r) and b £ bodyAt(f))- Moreover, we assume that k is such that 
for all b £ body^ t (r), n(r) > X(b). It can easily be seen that for any stratified theory 
C, it is always possible to find such a k. 

Our goal is now to show that, first, all weak execution models that follow n (Dcf. 
[5]) also satisfy temporal precedence and, second, that such a process indeed exists. 

Let us start by making some general observations about any weak execution 
model T that follows n. For any descendant s' of a node s of T, it is, by definition, 
the case that k(£(s')) > k(£(s)). Because every event r can only affect the truth 
value of atoms with timing > /t(r), it must be the case that, for each rule r with 
timing < k(£(s)), body(r) x ^ = body(r) x ( s '. Suppose now that for such an r it 
would be the case that r £ TZ(s) and T(s) |= body(r), i.e., r is an event that 
could also have happened in s. In this case, body(r) would remain satisfied in all 
descendants of s, up to and including each leaf I that might be reached. However, it 
is impossible that r actually happens in some descendant s' of s, since that would 
violate the constraint that k(£(s')) > k(£(s)). So, it would be the case that r £ 71(1) 
and 1(1) \= body(r), which would contradict the fact that I is a leaf. We conclude 
that such an r cannot exist, i.e., for each s, it must be the case that £(s) is a rule 
with minimal timing among all rules r £ IZ(s) for which I(s) |= body(r). 

Let us now assume that we non-deterministically construct a probabilistic S- 
process T as follows: 

• We start with only a root s, with l(s) — {}; 
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• As long as one exists, we select a leaf s of our current tree, for which the set 
of rules r £ TZ(s) such that I{s) \= body(r) is non-empty. We then extend T 
by executing one of the rules whose timing is minimal in this set. 

As shown in the previous paragraph, all weak execution models that follow k 
can be constructed in this way. Conversely, each process T that we can construct 
in this way can easily be seen to also be a weak execution model. Moreover, it 
is again easy to see that for all descendants s' of s of such a T and each rule r 
with timing < k(£(s)), body(r) x ^ — body(r) x ( s >. Therefore, as we go along any 
particular branch of T, the minimum timing of all rules with true body can only 
increase, which means that each process constructed in the above way must follow 
K. So, this provides an alternative, constructive characterization of the set of all 
weak execution models that follow k. An immediate consequence is that there exist 
such processes. Therefore, it now suffices to show that all these processes also satisfy 
temporal precedence. 

Proposition 4 

Each weak execution model T that follows the timing k also satisfies temporal 
precedence and is, therefore, an execution model. 

Proof 

We need to show that, for each node s of T, v s (body{£(s)) = t. In general, applying 
an event with timing > i during a hypothetical derivation sequence only modifies 
atoms with timing > i, and hence, can only modify the truth value of bodies of 
events with timing > i. Because, in the first step i/q of a sequence constructing 
v s , the only events that can be used are those r S 72.(s) for which I(s) |= body(r) 
and we know that the time of £{s) is minimal among these events, we conclude 
that X(s) and v s coincide on all atoms p with timing X(p) < i. Because C strictly 
respects A, all atoms p G body^Ar) therefore have the same truth value in v s 
as in T(s). Moreover, Z(s) << v s , so, in particular, for all atoms p S body~^ t (r), 
I(s)(p) < t v s (p). By a well-known monotonicity property of three-valued logic, 
t = body(r) x ^ < t body(r) u ". Hence, body{r) Us is indeed t. □ 

This concludes our proof of Theorem [71 Since this theorem clearly generalizes 
Theorem [31 we have now proven all theorems stated in this paper. 
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